An Improved Metric and Benchmark for Assessing the Performance of Virtual Screening Models
Abstract: Structure-based virtual screening (SBVS) is a key workflow in computational drug discovery. SBVS models are assessed by measuring the enrichment of known active molecules over decoys in retrospective screens. However, the standard formula for enrichment cannot estimate model performance on very large libraries. Additionally, current screening benchmarks cannot easily be used with ML models due to data leakage. We propose an improved formula for calculating VS enrichment and introduce the BayesBind benchmarking set composed of protein targets that are structurally dissimilar to those in the BigBind training set. We assess current models on this benchmark and find that none perform appreciably better than a KNN baseline.
- Directory of Useful Decoys, Enhanced (DUD-E): Better Ligands and Decoys for Better Benchmarking. Journal of Medicinal Chemistry, 55(14):6582–6594, July 2012. ISSN 0022-2623. doi: 10.1021/jm300687e. URL https://doi.org/10.1021/jm300687e. Publisher: American Chemical Society.
- Comparative Assessment of Scoring Functions: The CASF-2016 Update. Journal of Chemical Information and Modeling, 59(2):895–913, February 2019. ISSN 1549-9596. doi: 10.1021/acs.jcim.8b00545. URL https://doi.org/10.1021/acs.jcim.8b00545. Publisher: American Chemical Society.
- LIT-PCBA: An Unbiased Data Set for Machine Learning and Virtual Screening. Journal of Chemical Information and Modeling, 60(9):4263–4273, September 2020. ISSN 1549-9596. doi: 10.1021/acs.jcim.0c00155. URL https://doi.org/10.1021/acs.jcim.0c00155. Publisher: American Chemical Society.
- One size does not fit all: revising traditional paradigms for QSAR-based virtual screenings., December 2023. URL https://chemrxiv.org/engage/chemrxiv/article-details/6585ddc19138d23161476eb1.
- Predicting or Pretending: Artificial Intelligence for Protein-Ligand Interactions Lack of Sufficiently Large and Unbiased Datasets. Frontiers in Pharmacology, 11, 2020. ISSN 1663-9812. URL https://www.frontiersin.org/journals/pharmacology/articles/10.3389/fphar.2020.00069.
- Three-Dimensional Convolutional Neural Networks and a Cross-Docked Data Set for Structure-Based Drug Design. Journal of Chemical Information and Modeling, 60(9):4200–4215, September 2020. ISSN 1549-9596. doi: 10.1021/acs.jcim.0c00411. URL https://doi.org/10.1021/acs.jcim.0c00411. Publisher: American Chemical Society.
- On the Frustration to Predict Binding Affinities from Protein–Ligand Structures with Deep Neural Networks. Journal of Medicinal Chemistry, 65(11):7946–7958, June 2022. ISSN 0022-2623. doi: 10.1021/acs.jmedchem.2c00487. URL https://doi.org/10.1021/acs.jmedchem.2c00487. Publisher: American Chemical Society.
- BigBind: Learning from Nonstructural Data for Structure-Based Virtual Screening. Journal of Chemical Information and Modeling, December 2023. ISSN 1549-9596. doi: 10.1021/acs.jcim.3c01211. URL https://doi.org/10.1021/acs.jcim.3c01211. Publisher: American Chemical Society.
- Virtual Screening with Gnina 1.0. Molecules, 26(23):7369, January 2021. ISSN 1420-3049. doi: 10.3390/molecules26237369. URL https://www.mdpi.com/1420-3049/26/23/7369. Number: 23 Publisher: Multidisciplinary Digital Publishing Institute.
- ZINC – A Free Database of Commercially Available Compounds for Virtual Screening. Journal of chemical information and modeling, 45(1):177–182, 2005. ISSN 1549-9596. doi: 10.1021/ci049714. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1360656/.
- ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Research, 40(Database issue):D1100–D1107, January 2012. ISSN 0305-1048. doi: 10.1093/nar/gkr777. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245175/.
- Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening. PLOS ONE, 14(8):e0220113, August 2019. ISSN 1932-6203. doi: 10.1371/journal.pone.0220113. URL https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0220113. Publisher: Public Library of Science.
- AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading. Journal of computational chemistry, 31(2):455–461, January 2010. ISSN 0192-8651. doi: 10.1002/jcc.21334. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3041641/.
- GNINA 1.0: molecular docking with deep learning. Journal of Cheminformatics, 13(1):43, June 2021. ISSN 1758-2946. doi: 10.1186/s13321-021-00522-2. URL https://doi.org/10.1186/s13321-021-00522-2.
- Glide: A New Approach for Rapid, Accurate Docking and Scoring. Journal of Medicinal Chemistry, 47(7):1739–1749, March 2004. ISSN 0022-2623. doi: 10.1021/jm0306430. URL https://doi.org/10.1021/jm0306430. Publisher: American Chemical Society.
- SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods, 17(3):261–272, March 2020. ISSN 1548-7105. doi: 10.1038/s41592-019-0686-2. URL https://www.nature.com/articles/s41592-019-0686-2. Publisher: Nature Publishing Group.
- Open Babel: An open chemical toolbox. Journal of Cheminformatics, 3(1):33, October 2011. ISSN 1758-2946. doi: 10.1186/1758-2946-3-33. URL https://doi.org/10.1186/1758-2946-3-33.
- forlilab/Meeko, March 2024. URL https://github.com/forlilab/Meeko. original-date: 2020-11-07T12:05:36Z.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.