Data-driven molecular design for discovery and synthesis of novel ligands: a case study on SARS-CoV-2

Born, Jannis and Manica, Matteo and Cadow, Joris and Markert, Greta and Mill, Nil Adell and Filipavicius, Modestas and Janakarajan, Nikita and Cardinale, Antonio and Laino, Teodoro and Rodríguez Martínez, María (2021) Data-driven molecular design for discovery and synthesis of novel ligands: a case study on SARS-CoV-2. Machine Learning: Science and Technology, 2 (2). 025024. ISSN 2632-2153

[thumbnail of Born_2021_Mach._Learn.__Sci._Technol._2_025024.pdf] Text
Born_2021_Mach._Learn.__Sci._Technol._2_025024.pdf - Published Version

Download (844kB)

Abstract

Data-driven molecular design for discovery and synthesis of novel ligands: a case study on SARS-CoV-2 Jannis Born http://orcid.org/0000-0001-8307-5670 Matteo Manica http://orcid.org/0000-0002-8872-0269 Joris Cadow http://orcid.org/0000-0002-4410-2805 Greta Markert http://orcid.org/0000-0001-5254-5596 Nil Adell Mill http://orcid.org/0000-0003-0676-7547 Modestas Filipavicius Nikita Janakarajan http://orcid.org/0000-0001-7886-8385 Antonio Cardinale Teodoro Laino http://orcid.org/0000-0001-8717-0456 María Rodríguez Martínez http://orcid.org/0000-0003-3766-4233 Abstract

Bridging systems biology and drug design, we propose a deep learning framework for de novo discovery of molecules tailored to bind with given protein targets. Our methodology is exemplified by the task of designing antiviral candidates to target SARS-CoV-2 related proteins. Crucially, our framework does not require fine-tuning for specific proteins but is demonstrated to generalize in proposing ligands with high predicted binding affinities against unseen targets. Coupling our framework with the automatic retrosynthesis prediction of IBM RXN for Chemistry , we demonstrate the feasibility of swift chemical synthesis of molecules with potential antiviral properties that were designed against a specific protein target. In particular, we synthesize an antiviral candidate designed against the host protein angiotensin converting enzyme 2 (ACE2); a surface receptor on human respiratory epithelial cells that facilitates SARS-CoV-2 cell entry through its spike glycoprotein.

This is achieved as follows. First, we train a multimodal ligand–protein binding affinity model on predicting affinities of bioactive compounds to target proteins and couple this model with pharmacological toxicity predictors. Exploiting this multi-objective as a reward function of a conditional molecular generator that consists of two variational autoencoders (VAE), our framework steers the generation toward regions of the chemical space with high-reward molecules. Specifically, we explore a challenging setting of generating ligands against unseen protein targets by performing a leave-one-out-cross-validation on 41 SARS-CoV-2-related target proteins. Using deep reinforcement learning, it is demonstrated that in 35 out of 41 cases, the generation is biased towards sampling binding ligands, with an average increase of 83% comparing to an unbiased VAE. The generated molecules exhibit favorable properties in terms of target binding affinity, selectivity and drug-likeness. We use molecular retrosynthetic models to provide a synthetic accessibility assessment of the best generated hit molecules. Finally, with this end-to-end framework, we synthesize 3-Bromobenzylamine, a potential inhibitor of the host ACE2 protein, solely based on the recommendations of a molecular retrosynthesis model and a synthesis protocol prediction model. We hope that our framework can contribute towards swift discovery of de novo molecules with desired pharmacological properties.
03 25 2021 06 01 2021 025024 http://dx.doi.org/10.1088/crossmark-policy iopscience.iop.org Data-driven molecular design for discovery and synthesis of novel ligands: a case study on SARS-CoV-2 Machine Learning: Science and Technology paper © 2021 The Author(s). Published by IOP Publishing Ltd 2020-11-27 2021-02-19 2021-03-25 H2020 European Research Council http://dx.doi.org/10.13039/100010663 http://dx.doi.org/10.13039/100010663 826121 http://creativecommons.org/licenses/by/4.0 https://iopscience.iop.org/info/page/text-and-data-mining 10.1088/2632-2153/abe808 https://iopscience.iop.org/article/10.1088/2632-2153/abe808 https://iopscience.iop.org/article/10.1088/2632-2153/abe808/pdf https://iopscience.iop.org/article/10.1088/2632-2153/abe808/pdf https://iopscience.iop.org/article/10.1088/2632-2153/abe808/pdf https://iopscience.iop.org/article/10.1088/2632-2153/abe808/pdf https://iopscience.iop.org/article/10.1088/2632-2153/abe808 https://iopscience.iop.org/article/10.1088/2632-2153/abe808/pdf https://iopscience.iop.org/article/10.1088/2632-2153/abe808 https://iopscience.iop.org/article/10.1088/2632-2153/abe808/pdf New Engl. J. Med. Drosten 10.1056/NEJMoa030747 348 1967 2003 Identification of a novel coronavirus in patients with severe acute respiratory syndrome Shamshirian 2020 Hydroxychloroquine versus COVID-19: a periodic systematic review and meta-analysis New Engl. J. Med. Beigel 10.1056/NEJMoa2007764 2020 Remdesivir for the treatment of COVID-19–Final report Lancet Wang 10.1016/S0140-6736(20)31022-9 395 1569 2020 Remdesivir in adults with severe COVID-19: a randomised, double-blind, placebo-controlled, multicentre trial Lamb 1 2020 Remdesivir: first approval Nature Gordon 1 2020 A SARS-CoV-2 protein interaction map reveals targets for drug repurposing J. Comput.-Aided Mol. Des. Polishchuk 10.1007/s10822-013-9672-4 27 675 2013 Estimation of the size of drug-like chemical space based on GDB-17 data Nat. Rev. Drug Discovery Scannell 10.1038/nrd3681 11 191 2012 Diagnosing the decline in pharmaceutical R&D efficiency Nat. Mach. Intell. Schneider 1 2019 Mind and machine in drug design Drug Discovery Today Kinch 10.1016/j.drudis.2014.03.018 19 1033 2014 An overview of FDA-approved new molecular entities: 1827–2013 Nucleic Acids Res. Kim 10.1093/nar/gkv951 44 D1202 2016 PubChem substance and compound databases Bioinformatics Karimi 10.1093/bioinformatics/btz111 35 3329 2019 DeepAffinity: interpretable deep learning of compound–protein affinity through unified recurrent and convolutional neural networks ACS Cent. Sci. Gomez-Bombarelli 10.1021/acscentsci.7b00572 4 268 2018 Automatic chemical design using a data-driven continuous representation of molecules Molecular Inform. Blaschke 10.1002/minf.201700123 37 2018 Application of generative autoencoder in de novo molecular design Sci. Adv. Popova 10.1126/sciadv.aap7885 4 eaa7885 2018 Deep reinforcement learning for de novo drug design Popova 2019 MolecularRNN: generating realistic molecular graphs with optimized properties Nat. Biotechnol. Zhavoronkov 10.1038/s41587-019-0224-x 37 1038 2019 Deep learning enables rapid identification of potent DDR1 kinase inhibitors Zhavoronkov 2020 10.26434/chemrxiv.12301457.v1 Potential non-covalent SARS-CoV-2 3C-like protease inhibitors designed using generative deep learning approaches and reviewed by human medicinal chemist in virtual reality Tang 2020 10.1101/2020.03.03.972133 AI-aided design of novel targeted covalent inhibitors against SARS-CoV-2 Future Medicinal Chemistry Bung 10.4155/fmc-2020-0262 2021 De novo design of new chemical entities for SARS-CoV-2 using artificial intelligence Born pp 231 2020 PaccMann RL: designing anticancer drugs from transcriptomic data via reinforcement learning Aumentado-Armstrong 2018 Latent molecular optimization for targeted therapeutic design J. Chem. Inf Model. Krishnan 10.1021/acs.jcim.0c01060 61 621 2021 Accelerating de novo drug design against novel proteins using deep learning Mol. Pharmaceutics Skalic 10.1021/acs.molpharmaceut.9b00634 16 4282 2019 From target to drug: generative modeling for the multimodal structure-based ligand design Chenthamarakshan 33 2020 Cogmol: target-specific and selective drug design for COVID-19 using deep generative models Chem. Sci. Schwaller 10.1039/C9SC05704H 2020 Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy Nat. Commun. Vaucher 10.1038/s41467-020-17266-6 2020 Automated extraction of chemical synthesis actions from experimental procedures Kingma 2013 Auto-encoding variational bayes Mach. Learn. Williams 10.1007/BF00992696 8 229 1992 Simple statistical gradient-following algorithms for connectionist reinforcement learning Front. Environ. Sci. Huang 10.3389/fenvs.2015.00085 3 85 2016 Tox21Challenge to build predictive models of nuclear receptor and stress response pathways as mediated by exposure to environmental chemicals and drugs Bjerrum 2017 Smiles enumeration as data augmentation for neural network modeling of molecules J. Chem. Inf. Comput. Sci. Weininger 10.1021/ci00057a005 28 31 1988 SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules Markert 2020 Chemical representation learning for toxicity prediction Nucleic Acids Res. Gilson 10.1093/nar/gkv1072 44 D1045 2016 BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology Mol. Pharm. Manica 16 4797 2019 10.1021/acs.molpharmaceut.9b00520 Nucleic Acids Res. Cadow 10.1093/nar/gkaa327 48 W502 2020 PaccMann: a web service for interpretable anticancer compound sensitivity prediction Nucleic Acids Res. Consortium 47 D506 2019 10.1093/nar/gky1049 UniProt: a worldwide hub of protein knowledge Devlin 2018 Bert: pre-training of deep bidirectional transformers for language understanding Rao pp 9686 2019 Evaluating protein transfer learning with TAPE Nucleic Acids Res. El-Gebali 10.1093/nar/gky995 47 D427 2019 The Pfam protein families database in 2019 Bowman 2015 Generating sentences from a continuous space Mach. Learn.: Sci. Technol. Krenn 1 2020 Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation Acta Pharm. Sin. B Wu 10.1016/j.apsb.2020.02.008 2020 Analysis of therapeutic targets for SARS-CoV-2 and discovery of potential drugs by computational methods Khaerunnisa 10.20944/preprints202003.0226.v1 1 2020 Potential inhibitor of COVID-19 main protease (Mpro) from several medicinal plant compounds by molecular docking study ACS Cent. Sci. Schwaller 10.1021/acscentsci.9b00576 5 1572 2019 Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction Vaucher 2020 10.26434/chemrxiv.13118423.v1 Inferring Experimental Procedures from Text-Based Representations of Chemical Reactions Nat. Chem. Bickerton 10.1038/nchem.1243 4 90 2012 Quantifying the chemical beauty of drugs J. Chem. Inf. Arús-Pous 11 1 2019 Exploring the GDB-13 chemical space using deep generative models McInnes 2018 J. Chem. Inf. Model. Rogers 10.1021/ci100050t 50 742 2010 Extended-connectivity fingerprints Bioinformatics Probst 10.1093/bioinformatics/btx760 34 1433 2018 FUn: a framework for interactive visualizations of large, high-dimensional datasets on the web J. Chem. Inf. Probst 10.1186/s13321-020-0416-x 12 1 2020 Visualization of very large high-dimensional data sets as minimum spanning trees Sci. Rep. Peón 7 1 2017 10.1038/s41598-017-04264-w Predicting the reliability of drug-target interaction predictions with maximum coverage of target space Molecules Miljković 10.3390/molecules23102434 23 2434 2018 Data-driven exploration of selectivity and off-target activities of designated chemical probes Eur. J. Clin. Microbiol. Inf. Dis. Li 10.1007/s10096-020-03883-y 39 1021 2020 Searching therapeutic strategy of new coronavirus pneumonia from angiotensin-converting enzyme 2: the target of COVID-19 and SARS-CoV Intensive Care Med. Zhang 10.1007/s00134-020-05985-9 46 586 2020 Angiotensin-converting enzyme 2 (ACE2) as a SARS-CoV-2 receptor: molecular mechanisms and potential therapeutic target Pharm. Res. McKee 10.1016/j.phrs.2020.104859 2020 Candidate drugs against SARS-CoV-2 and COVID-19 J. Mol. Graph. Model. Teralı 10.1016/j.jmgm.2020.107697 2020 Prioritizing potential ACE2 inhibitors in the COVID-19 pandemic: insights from a molecular mechanics-assisted structure-based virtual screening experiment Ray 2020 Predicting potential drug targets and repurposable drugs for COVID-19 via a deep generative model for graphs J. Pathol. Hamming 10.1002/path.1570 203 631 2004 Tissue distribution of ACE2 protein, the functional receptor for SARS coronavirus. A first step in understanding SARS pathogenesis Circ. Res. Donoghue 10.1161/01.RES.87.5.e1 87 e1 2000 A novel angiotensin-converting enzyme–related carboxypeptidase (ACE2) converts angiotensin I to angiotensin 1-9 J. Biol. Chem. Tipnis 10.1074/jbc.M002615200 275 33238 2000 A human homolog of angiotensin-converting enzyme cloning and functional expression as a captopril-insensitive carboxypeptidase Trends Cardiovasc. Med. Oudit 10.1016/S1050-1738(02)00233-5 13 93 2003 The role of ACE2 in cardiovascular physiology Nature Crackower 10.1038/nature00786 417 822 2002 Angiotensin-converting enzyme 2 is an essential regulator of heart function J. Cardiovasc. Pharmacol. le Tran 10.1097/00005344-199711000-00019 30 676 1997 Angiotensin-(1-7) and the rat aorta: modulation by the endothelium Vasc. Health Risk Manag. Schindler 3 125 2007 Role of the vasodilator peptide angiotensin-(1–7) in cardiovascular drug therapy Am. J. Physiol. Lung Cell Mol. Physiol. Li 10.1152/ajplung.00009.2008 295 L178 2008 Angiotensin converting enzyme-2 is protective but downregulated in human and experimental lung fibrosis Nature Li 10.1038/nature02145 426 450 2003 Angiotensin-converting enzyme 2 is a functional receptor for the SARS coronavirus Science Li 10.1126/science.1116480 309 1864 2005 Structure of SARS coronavirus spike receptor-binding domain complexed with receptor Cell Hoffmann 10.1016/j.cell.2020.02.052 2020 SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor Nature Zhou 1 2020 A pneumonia outbreak associated with a new coronavirus of probable bat origin Cell Wang 2020 Structural and functional basis of SARS-CoV-2 entry by using human ACE2 Science Yan 10.1126/science.abb2762 367 1444 2020 Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 Cell. Mol. Immunol. Tai 1 2020 Characterization of the receptor-binding domain (RBD) of 2019 novel coronavirus: implication for development of RBD protein as a viral attachment inhibitor and vaccine Hum. Vaccines Immunother. Chen 10.1080/21645515.2020.1829316 1 2020 Potential for developing a SARS-CoV receptor-binding domain (RBD) recombinant protein as a heterologous human vaccine against coronavirus infectious disease (COVID)-19 J. Med. Chem. Grunewald 10.1021/jm00397a029 31 433 1988 Conformational and steric aspects of the inhibition of phenylethanolamine N-methyltransferase by benzylamines Bioinformatics Cao 10.1093/bioinformatics/btn186 24 i366 2008 A maximum common substructure-based algorithm for searching and predicting drug-like compounds Ther. Arch. Pshenichnaya 10.26442/00403660.2019.03.000127 91 56 2019 Clinical efficacy of umifenovir in influenza and ARVI (study ARBITR) Virol. J. Boriskin 10.1186/1743-422X-3-56 3 56 2006 Arbidol: a broad-spectrum antiviral that inhibits acute and chronic HCV infection ACS Cent. Sci. Liu 10.1021/acscentsci.0c00272 6 315 2020 Research and development on therapeutic agents and vaccines for COVID-19 and related human coronavirus diseases Front. Cardiovascular Med. Mascolo 7 2020 Renin-angiotensin system and coronavirus disease 2019: a narrative review Curr. Med. Chem. Boriskin 10.2174/092986708784049658 15 997 2008 Arbidol: a broad-spectrum antiviral compound that blocks viral fusion Proc. Natl Acad. Sci. Kadam 10.1073/pnas.1617020114 114 206 2017 Structural basis of influenza virus fusion inhibition by the antiviral drug Arbidol Virus Res. Choudhary 10.1016/j.virusres.2020.198146 289 2020 Scaffold morphing of arbidol (umifenovir) in search of multi-targeting therapy halting the interaction of SARS-CoV-2 with ACE2 and other proteases involved in COVID-19 Padhi 2020 10.26434/chemrxiv.12464576.v1 How does arbidol inhibit the novel coronavirus SARS-CoV-2? Atomistic insights from molecular dynamics simulations Nat. Res. Zhao 2020 Cross-linking peptide and repurposed drugs inhibit both entry pathways of SARS-CoV-2 Vopr. Virusol. Khamitov 53 9 2008 Antiviral activity of arbidol and its derivatives against the pathogen of severe acute respiratory syndrome in the cell cultures Chin. J. Inf. Dis. Jun E008 2020 Efficacies of lopinavir/ritonavir and abidol in the treatment of novel coronavirus pneumonia Clin. Inf. Dis. Wang 10.1093/cid/ciaa538 2020 Clinical features of 69 cases with coronavirus disease 2019 in Wuhan, China Ther. Arch. Leneva 11 5 2020 Umifenovir and coronavirus infections: a review of research results and clinical practice J. Inf. Deng 10.1016/j.jinf.2020.03.002 2020 Arbidol combined with LPV/r versus LPV/r alone against corona virus disease 2019: a retrospective cohort study J. Med. Virol. Huang 2020 Efficacy and safety of umifenovir for coronavirus disease 2019 (COVID-19): a systematic review and meta-analysis Bioorg. Med. Chem. Di Mola 10.1016/j.bmc.2014.09.013 22 6014 2014 Structure–activity relationship study of arbidol derivatives as inhibitors of chikungunya virus replication J. Chem. Inf. Comput. Sci. Delaney 10.1021/ci034243x 44 1000 2004 ESOL: estimating aqueous solubility directly from molecular structure J. Chem. Inf. Arús-Pous 12 1 2020 SMILES-based deep generative scaffold decorator for de-novo drug design Heiser 2020 10.1101/2020.04.21.054387 Identification of potential treatments for COVID-19 through artificial intelligence-enabled phenomic analysis of human cells infected with SARS-CoV-2 Nature Lan 10.1038/s41586-020-2180-5 581 215 2020 Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor Nucleic Acids Res. Bento 42 D1083 2013 10.1093/nar/gkt1031 The ChEMBL bioactivity database: an update

Item Type: Article
Subjects: Academic Digital Library > Multidisciplinary
Depositing User: Unnamed user with email info@academicdigitallibrary.org
Date Deposited: 03 Jul 2023 04:30
Last Modified: 16 Oct 2023 03:57
URI: http://publications.article4sub.com/id/eprint/1959

Actions (login required)

View Item
View Item