Research paper: Artificial Intelligence for Drug Discovery: Are We There Yet?

Note: This is not my work. I am a Lemmy user just like you. I receive your comments. I have nothing to gain in posting this. You directly impact future efforts with your interactions. This is a cherry picked section of a larger paper and is representative of something I found interesting. It is not complete or representative of the author, publication, or institutions, and all errors are my own. This text was extracted with OCR.

(PREPUBLISH)

Hasselgren C, Oprea Ti. 2024. Annu. Rev. Pharmacol. Toxicol. Accepted. DOI: 10.1146/annurev- pharmtox-040323-040828

Annual Reviews in Pharmacology and Toxicology, v.64, 2024

“…(section 5 p17.)…

Challenges and limitations of AI in drug discovery

“Artificial intelligence in drug discovery” might give the impression that Al is successfully employed in early drug discovery. While there is some truth to this statement, it’s important to note that no medicines approved by regulatory agencies can be attributed to Al in the same way Al has achieved victories in chess, Go, Jeopardy! autonomous vehicles, or poetry generation. Drug discovery is a complex, multifaceted process, as captured by the 4DM charts. Robot Scientist Adam independently conducted genomic experiments (164), and Robot Scientist Eve performed an HTS campaign to identify anti-malarial compounds (165). Although computer-aided processes have been used for compound selection and optimization, no Al-driven Robot Scientist or digital equivalent currently exists that can execute fully automated drug discovery. Automated Al-driven drug discovery remains an aspirational goal (166). Most success stories to date have relied on machine learning, cheminformatics, bioinformatics software, natural language processing, or other computational platforms that support human decision-making. In summary, drug discovery has yet to benefit from a comprehensive Al system.

One of the weak aspects of AlADD for small molecules is the training of ML models that encode chemical features (often referred to as QSARs), such as those derived from chemical structures. Bohacek et al. estimated (167) the number of ‘drug-like’ chemicals to be up to 10%, and virtual screening libraries have already exceeded 30 billion compounds (94). The logistical and practical challenges of virtually screening 30 billion compounds, considering an estimated 10-50 conformers per molecule, amounting to nearly 500 billion objects, are immense and beyond the scope of this review. Instead, our focus is on the practical issues related to the applicability domain (168) and external predictivity validation (169). Both validation and applicability are challenges faced by target-based KG machine learning models, as mentioned earlier.

Machine learning models commonly used in AI4DD are often trained on tens of thousands of compounds or less, which raises questions about their effectiveness in sampling the chemical space of 30 billion compounds. Can we confidently assume that such comparatively small training sets effectively represent the chemical space of 30 billion? Are such predictions trustworthy? Both the applicability domain (a representational issue in ML feature space) and chemical diversity (unseen scaffolds are less likely to produce reliable predictions even within the applicability domain) raise concerns about the predictivity of ML models for the unexplored “chemical universe.” Ideally, Al4DD practitioners would want to systematically sample chemical space using adequately trained ML models. This becomes imperative during lead optimization, where progress depends on the accurate representation of relevant chemical scaffolds in the ML space.

Target identification in drug discovery is also impacted by the so-called reproducibility crisis (170). Bayer (171) and Amgen (172) have reported low reproducibility rates (33% and 11%, respectively) of high-impact publications, and many biomedical publications are false (173). From an Al perspective, filtering out false data requires a coordinated community effort. Lessons (174) from elife’s Reproducibility Project, which focused on cancer biology, highlight issues like weaker-than-previously-published (175) effects and inaccurate protocol descriptions (176), among others. The possibility of indexing fabricated publications (177) or those generated by “research paper mills” (178) further increases the likelihood of false information in the field. These challenges with experimental data compound the issues of ML model accuracy and the ML science reproducibility crisis {76). For Al4DD to be effective, it needs to be anchored in truth.

Another subtle risk involves the education of scientists in using Al4DD models effectively. Questions about when, how, and in what order to deploy ML models are crucial. The proper use of ML models depends on the specific requirements of each unique drug discovery project. For some projects, target selectivity and appropriate in-tissue delivery might be more important than absolute affinity or systemic toxicity. In contrast, other projects might focus on mitigating on- target toxicity, low permeability, or scaffold similarity to competitor patents. Each issue demands different computational solutions, ranging from filters and lead hopping to sequential ML model deployment. Proper training in using Al4DD models is critical to ensure that scientists can effectively navigate these complexities and make informed decisions based on the specific needs of their drug discovery projects.

The human component of drug discovery is another crucial aspect. In many academic and industrial settings, decision-making falls to medicinal chemists who typically rely on their judgment to propose compounds for the design-make-test (DMT) cycle rather than depending solely on Al. Compounds may be thoroughly evaluated by the project team, with members voting on the order in which chemicals should be synthesized and tested. In Al-integrated companies, Al may influence this process, but chemists are still likely to veto compounds that don’t meet specific criteria, even if computational chemists or toxicologists find no issues. It is reasonable to assume that user expertise, bias, and time constraints play a significant role in early drug discovery, often more so than Al. The Pfizer “rule of 5” (Ro5 or Lipinski rules) serves as an early example (179) of attempts to integrate informatics and data science into the early stages of drug discovery. The RoS criteria, assessing hydrogen bonding capacity, the calculated octanol/water partitioning coefficient (logP), and molecular weight have been employed world-wide to narrow down the chemical solution space. It is undeniable that Ro5 criteria have had a significant impact on medicinal chemistry {180). However, the influence of these criteria is gradually diminishing over time (181). …”