Lo-Hi: Practical ML Drug Discovery Benchmark
Abstract: Finding new drugs is getting harder and harder. One of the hopes of drug discovery is to use machine learning models to predict molecular properties. That is why models for molecular property prediction are being developed and tested on benchmarks such as MoleculeNet. However, existing benchmarks are unrealistic and are too different from applying the models in practice. We have created a new practical \emph{Lo-Hi} benchmark consisting of two tasks: Lead Optimization (Lo) and Hit Identification (Hi), corresponding to the real drug discovery process. For the Hi task, we designed a novel molecular splitting algorithm that solves the Balanced Vertex Minimum $k$-Cut problem. We tested state-of-the-art and classic ML models, revealing which works better under practical settings. We analyzed modern benchmarks and showed that they are unrealistic and overoptimistic. Review: https://openreview.net/forum?id=H2Yb28qGLV Lo-Hi benchmark: https://github.com/SteshinSS/lohi_neurips2023 Lo-Hi splitter library: https://github.com/SteshinSS/lohi_splitter
- Drug design and discovery: principles and applications, 2017.
- The role of absorption, distribution, metabolism, excretion and toxicity in drug discovery. Current topics in medicinal chemistry, 3(10):1125–1154, 2003.
- Metstabon—online platform for metabolic stability predictions. International journal of molecular sciences, 19(4):1040, 2018.
- Estimation of biliary excretion of foreign compounds using properties of molecular structure. The AAPS journal, 16:65–78, 2014.
- Predicting blood- brain barrier permeation from three-dimensional molecular structure. Journal of medicinal chemistry, 43(11):2204–2216, 2000.
- A merged molecular representation deep learning method for blood–brain barrier permeability prediction. Briefings in Bioinformatics, 23(5), 2022.
- Raimund Mannhold and Han Van de Waterbeemd. Substructure and whole molecule approaches for calculating log p. Journal of Computer-Aided Molecular Design, 15:337–354, 2001.
- Toxim: a toxicity prediction tool for small molecules developed using machine learning and chemoinformatics approaches. Frontiers in pharmacology, 8:880, 2017.
- Phenotypic side effects prediction by optimizing correlation with chemical and target profiles of drugs. Molecular BioSystems, 11(11):2900–2906, 2015.
- A practical guide to large-scale docking. Nature protocols, 16(10):4799–4832, 2021.
- Virtual discovery of melatonin receptor ligands to modulate circadian rhythms. Nature, 579(7800):609–614, 2020.
- Identification of metabotropic glutamate receptor subtype 5 potentiators using virtual high-throughput screening. ACS chemical neuroscience, 1(4):288–305, 2010.
- Discovery of novel antimalarial compounds enabled by qsar-based virtual screening. Journal of chemical information and modeling, 53(2):475–492, 2013.
- Discovery of new anti-schistosomal hits by integration of qsar-based virtual screening and high content screening. Journal of medicinal chemistry, 59(15):7075–7088, 2016.
- Transcriptomics-based screening identifies pharmacological inhibition of hsp90 as a means to defer aging. Cell Reports, 27(2):467–480, 2019.
- A deep learning approach to antibiotic discovery. Cell, 180(4):688–702, 2020.
- Moleculenet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2018.
- Therapeutics data commons: Machine learning datasets and tasks for drug discovery and development. arXiv preprint arXiv:2102.09548, 2021.
- Moldata, a molecular benchmark for disease and target based machine learning. Journal of Cheminformatics, 14(1):1–18, 2022.
- Drugood: Out-of-distribution (ood) dataset curator and benchmark for ai-aided drug discovery–a focus on affinity prediction problems with noise annotations. arXiv preprint arXiv:2201.09637, 2022.
- Good: A graph out-of-distribution benchmark. arXiv preprint arXiv:2206.08452, 2022.
- Activity cliff prediction: Dataset and benchmark. arXiv preprint arXiv:2302.07541, 2023.
- Exposing the limitations of molecular machine learning with activity cliffs. Journal of Chemical Information and Modeling, 62(23):5938–5951, 2022.
- Hit discovery and hit-to-lead approaches. Drug discovery today, 11(15-16):741–748, 2006.
- Asher Mullard. How much do phase iii trials cost? Nature Reviews. Drug Discovery, 17(11):777–777, 2018.
- How much do clinical trials cost. Nat Rev Drug Discov, 16(6):381–382, 2017.
- Ultra-large library docking for discovering new chemotypes. Nature, 566(7743):224–229, 2019.
- Discovering small-molecule senolytics with deep neural networks. Nature Aging, pages 1–17, 2023.
- Deep learning-guided discovery of an antibiotic targeting acinetobacter baumannii. Nature Chemical Biology, pages 1–9, 2023.
- RD Brown and YC Martin. An evaluation of structural descriptors and clustering methods for use in diversity selection. SAR and QSAR in Environmental Research, 8(1-2):23–39, 1998.
- Using transcriptomics to guide lead optimization in drug discovery projects: Lessons learned from the qstar project. Drug discovery today, 20(5):505–513, 2015.
- Analysis of neighborhood behavior in lead optimization and array design. Journal of chemical information and modeling, 49(2):195–208, 2009.
- Guacamol: benchmarking models for de novo molecular design. Journal of chemical information and modeling, 59(3):1096–1108, 2019.
- Mimosa: Multi-constraint molecule sampling for molecule optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 125–133, 2021.
- Junction tree variational autoencoder for molecular graph generation. In International conference on machine learning, pages 2323–2332. PMLR, 2018.
- Scaffold-constrained molecular generation. Journal of Chemical Information and Modeling, 60(12):5637–5646, 2020.
- Design of potent antimalarials with generative chemistry. Nature Machine Intelligence, 4(2):180–186, 2022.
- Sample efficiency matters: a benchmark for practical molecular optimization. Advances in Neural Information Processing Systems, 35:21342–21357, 2022.
- Neighborhood behavior: a useful concept for validation of “molecular diversity” descriptors. Journal of medicinal chemistry, 39(16):3049–3059, 1996.
- Do structurally similar molecules have similar biological activity? Journal of medicinal chemistry, 45(19):4350–4358, 2002.
- Molecular similarity in medicinal chemistry: miniperspective. Journal of medicinal chemistry, 57(8):3186–3204, 2014.
- Reply to ‘assessing the impact of generative ai on medicinal chemistry’. Nature Biotechnology, 38(2):146–146, 2020.
- The use of 2d fingerprint methods to support the assessment of structural similarity in orphan drug legislation. Journal of cheminformatics, 6:1–10, 2014.
- Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5):742–754, 2010.
- Analysis and display of the size dependence of chemical similarity coefficients. Journal of chemical information and computer sciences, 43(3):819–828, 2003.
- Tankbind: Trigonometry-aware neural networks for drug-protein binding structure prediction. bioRxiv, pages 2022–06, 2022.
- Exploring chemical space with score-based out-of-distribution generation. In International Conference on Machine Learning, pages 18872–18892. PMLR, 2023.
- Enhanced deep-learning prediction of molecular properties via augmentation of bond topology. ChemMedChem, 14(17):1604–1609, 2019.
- Development of a chemically intuitive filter for chemical graph convolutional network. Bulletin of the Korean Chemical Society, 43(7):934–936, 2022.
- Path-augmented graph transformer network. arXiv preprint arXiv:1905.12712, 2019.
- Transformer based molecule encoding for property prediction. arXiv preprint arXiv:2011.03518, 2020.
- Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology, 3(1):015022, 2022.
- Yao Zhang et al. Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning. Chemical science, 10(35):8154–8163, 2019.
- The properties of known drugs. 1. molecular frameworks. Journal of medicinal chemistry, 39(15):2887–2893, 1996.
- Effective prediction of drug–target interaction on hiv using deep graph neural networks. Chemometrics and Intelligent Laboratory Systems, 230:104676, 2022.
- Chemnet: A transferable and generalizable deep neural network for small-molecule property prediction. Technical report, Pacific Northwest National Lab.(PNNL), Richland, WA (United States), 2017.
- Smiles transformer: Pre-trained molecular fingerprint for low data drug discovery. arXiv preprint arXiv:1911.04738, 2019.
- Edge attention-based multi-relational graph convolutional networks. arXiv preprint arXiv: 1802.04944, 2018.
- Deep transferable compound representation across domains and tasks for low data drug discovery. Journal of chemical information and modeling, 59(11):4528–4539, 2019.
- Real-world molecular out-of-distribution: Specification and investigation. 2023.
- Advances in activity cliff research. Molecular informatics, 35(5):181–191, 2016.
- Mathematical formulations for the balanced vertex k-separator problem. In 2014 International Conference on Control, Decision and Information Technologies (CoDIT), pages 176–181. IEEE, 2014.
- Egon Balas and Cid C de Souza. The vertex separator problem: a polyhedral investigation. Mathematical Programming, 103(3):583–608, 2005.
- Stephan Schwartz. An overview of graph covering and partitioning. Discrete Mathematics, 345(8):112884, 2022.
- The vertex k-cut problem. Discrete Optimization, 31:8–28, 2019.
- Paolo Paronuzzi. Models and algorithms for decomposition problems. 2020.
- On integer and bilevel formulations for the k-vertex cut problem. Mathematical Programming Computation, 12:133–164, 2020.
- Detecting k𝑘kitalic_k-vertex cuts in sparse networks via a fast local search approach. IEEE Transactions on Computational Social Systems, 2023.
- Darko Butina. Unsupervised data base clustering based on daylight’s fingerprint and tanimoto similarity: A fast and automated way to cluster small and large data sets. Journal of Chemical Information and Computer Sciences, 39(4):747–750, 1999.
- How much space has been explored? measuring the chemical space covered by databases and machine-generated molecules. In The Eleventh International Conference on Learning Representations.
- The role of c957t, taqi and ser311cys polymorphisms of the drd2 gene in schizophrenia: systematic review and meta-analysis. Behavioral and Brain Functions, 12(1):1–14, 2016.
- Association of drd2 gene variant with schizophrenia. Neuroscience letters, 392(1-2):68–71, 2006.
- Association of drd2 and drd3 polymorphisms with parkinson’s disease in a multiethnic consortium. Journal of the neurological sciences, 307(1-2):22–29, 2011.
- Polymorphisms of drd2 and drd3 genes and parkinson’s disease: A meta-analysis. Biomedical reports, 2(2):275–281, 2014.
- Chembl: towards direct deposition of bioassay data. Nucleic acids research, 47(D1):D930–D940, 2019.
- Vascular endothelial growth factor receptor (vegfr-2)/kdr inhibitors: medicinal chemistry perspective. Medicine in Drug Discovery, 2:100009, 2019.
- Prospective validation of machine learning algorithms for absorption, distribution, metabolism, and excretion prediction: An industrial perspective. Journal of Chemical Information and Modeling, 2023.
- The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PloS one, 10(3):e0118432, 2015.
- Kelly Rae Chi. Revolution dawning in cardiotoxicity testing. Nature reviews. Drug discovery, 12(8):565, 2013.
- Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling, 59(8):3370–3388, 2019.
- Benchmarking graphormer on large-scale molecular modeling datasets. arXiv preprint arXiv:2203.04810, 2022. URL https://arxiv.org/abs/2203.04810.
- Do transformers really perform badly for graph representation? In Thirty-Fifth Conference on Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=OeWooOxFwDa.
- Improving reproducibility in machine learning research (a report from the neurips 2019 reproducibility program). The Journal of Machine Learning Research, 22(1):7459–7478, 2021.
- Are gans created equal? a large-scale study. Advances in neural information processing systems, 31, 2018.
- On the state of the art of evaluation in neural language models. arXiv preprint arXiv:1707.05589, 2017.
- Fs-mol: A few-shot learning dataset of molecules. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
- Zaixi Zhang and Qi Liu. Learning subpocket prototypes for generalizable structure-based drug design. arXiv preprint arXiv:2305.13997, 2023.
- Comparability of mixed ic50 data–a statistical analysis. PloS one, 8(4):e61007, 2013.
- The experimental uncertainty of heterogeneous public k i data. Journal of medicinal chemistry, 55(11):5165–5173, 2012.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.