PoseBusters Passing Rate
- PoseBusters Passing Rate is a metric that calculates the fraction of protein–ligand poses meeting 20 stringent chemical, geometric, and energetic criteria.
- It aggregates binary pass/fail results from each test, with state-of-the-art generative models achieving rates up to 95.9%, highlighting performance differences across methods.
- The metric is essential for structure-based drug design, guiding improvements in docking algorithms by revealing the need for integrating classical physics with deep-learning innovations.
The PoseBusters passing rate is a quantitative metric that evaluates the proportion of predicted protein–ligand poses that conform to a stringent suite of 20 geometric, chemical, and energetic quality-control checks. It is principally used to assess the physical plausibility and chemical feasibility of molecular docking outputs, especially in the context of deep generative models and structure-based drug design (SBDD). Unlike traditional RMSD-based criteria, PoseBusters passing rate (designated PBValid or PR) directly measures whether a pose is chemically valid, geometrically sound, and free of low-level steric and energetic pathologies as defined by established computational chemistry toolkits such as RDKit.
1. Definition and Mathematical Formulation
The PoseBusters passing rate is defined for a set of candidate docked poses as the fraction of poses passing all specified quality checks. Each generated pose is assigned a binary label: The aggregate passing rate is: This definition applies regardless of method or dataset. PoseBusters tests are distributed across three main categories:
- Chemical validity and consistency: Encompasses sanitization, InChI-convertibility, and molecular formula/bonding checks.
- Intramolecular geometric and energetic correctness: Includes bond length and angle bounds (), ring planarity ( Å), internal steric clashes, and strain energy ().
- Intermolecular plausibility: Minimum protein–ligand/cofactor/water atom distances, and pocket volume overlap constraints ().
For specific recent benchmarks, further submetrics have been introduced (e.g., PB-Valid-Mol for ligand-only checks and PB-Valid-Dock for protein–ligand interface checks), and for extensive SBDD tasks PBValid is applied to thousands of predicted poses per test set (Buttenschoen et al., 2023, Qiu et al., 12 May 2025, Jin et al., 30 Jan 2026).
2. PoseBusters Check Pipeline and Criteria
PoseBusters employs a sequential pipeline, automating 20 discrete quality-control filters:
A. Chemical Validity:
- RDKit molecular sanitization: passes only if atom valencies, aromaticity, hybridization are chemically plausible.
- InChI comparison: exact match to reference connectivity, hydrogens, tetrahedral centers, and double-bond E/Z stereochemistry.
B. Geometric and Energetic Checks:
- Bond distances and angles: each must lie within ±25% ideal bounds.
- Aromatic ring and C=C planarity: deviation 0.25 Å.
- Internal non-bonded atom pairs: minimum separation per scaled distance geometry.
- Strain energy of docked pose compared to conformational ensemble, ratio capped at 100.
C. Intermolecular Constraints:
- Minimum separation (weighted van der Waals/covalent radii) from protein, cofactors, waters.
- Pocket overlap: ligand volume overlapping with protein and cofactors 7.5%.
All criteria are strictly binary; a pose failing any check is discarded, and only entirely "PB-valid" poses contribute to PR.
3. Benchmark Datasets and Evaluation Protocols
The metric is typically applied on standardized protein–ligand benchmarks:
- Astex Diverse (85 cases): Classical and deep-learning docking methods compared head-to-head.
- CrossDocked2020 and PoseBusters Bench (hundreds to cases): Data filtered for crystal poses RMSD 1 Å, with train/test splits controlled for sequence identity ( 30%) to eliminate leakage (Jin et al., 30 Jan 2026, Qiu et al., 12 May 2025).
- Held-out OOD sets: Complexes not seen in training, used to measure generalization.
Evaluation workflow involves: (1) de novo ligand generation (100 candidates per protein target is typical), (2) rigid docking to predicted or reference pockets (frequently with AutoDock Vina), (3) batch PB-valid checks, and (4) computation and comparison of passing rates against benchmarks and prior methods.
4. Comparative Performance Across Methods
PoseBusters passing rates vary widely across algorithmic paradigms:
| Method | PBValid (%) |
|---|---|
| AR (Luo et al.) | 59.0 |
| Pocket2Mol | 72.3 |
| TargetDiff | 50.5 |
| DecompDiff | 71.7 |
| MolCRAFT | 84.6 |
| EvoEGF-Mol | 93.4 |
| MolPilot (VOS) | 95.9 |
| AutoDock Vina | 51.0 (PB-Bench) |
| CCDC Gold | 48.1 (PB-Bench) |
| DiffDock | 14.0 (PB-Bench) |
| DeepDock | 6.8 (PB-Bench) |
State-of-the-art generative SBDD models based on information geometry (EvoEGF-Mol), Bayesian flow networks (MolPilot), and optimal schedule design have pushed PBValid to reference or even super-reference levels (93–96%), dramatically exceeding both classical docking tools and earlier deep-learning baselines (Jin et al., 30 Jan 2026, Qiu et al., 12 May 2025, Buttenschoen et al., 2023). A plausible implication is that these geometric and probabilistic advances are addressing modality mismatch and training instability in previous protocols. Conversely, deep-learning methods lacking robust inductive bias frequently report sub-20% on novel benchmarks unless post-docking force-field minimization is applied, which can "rescue" many error modes.
5. Methodological Advances Underlying High Passing Rates
Establishment of high PBValid rates in recent models relies on explicit technical innovations:
- Unified Exponential-family modeling: Jointly modeling atomic coordinates and chemical types as a single composite exponential-family distribution under Fisher–Rao metric prevents geometric–chemical mismatch, supporting co-refinement of 3D structure and 2D identity (Jin et al., 30 Jan 2026).
- Evolving Exponential Geodesic Flow (EvoEGF): Dynamically concentrating endpoint distributions avoid trajectory collapse and maintain stable variance/support at each generation time step, resulting in more realistic samples and smooth training (Jin et al., 30 Jan 2026).
- VLB-Optimal Scheduling (VOS): Interpreting the Variational Lower Bound as a line integral in noise-schedule space, and optimizing the denoising path jointly across continuous and discrete variables, yields superior molecular and interaction fidelity (Qiu et al., 12 May 2025).
- Post-docking minimization: Application of classical force fields (AMBER/Sage via OpenMM) to deep-learning outputs substantially increases PBValid (e.g., DiffDock from 14% to ≈50%), indicating current models do not fully encode all relevant physical constraints (Buttenschoen et al., 2023).
These advances reduce artifacts such as twisted geometries, bond strain, and steric clashes, yielding low Jensen–Shannon divergences in bond/angle/torsion distributions, strain energies comparable to crystal structures, and improved fingerprint similarity for critical protein–ligand interactions.
6. Interpretations, Limitations, and Implications
High PoseBusters passing rates are correlated with chemically plausible, strain-free, and physically sensible candidate ligands. Recent models demonstrate almost perfect conformance on in-distribution CrossDock benchmarks, and maintain robust passing rates (80%) on challenging out-of-distribution held-out tests.
However, the metric is fundamentally binary and does not measure degree-of-error for near-failing poses, nor does it subsume criteria such as bioactivity, affinity, or specific pharmacophore matching. All-or-nothing filtering may thus discard marginally non-ideal candidates that remain viable after minimal energetic relaxation. Empirical evidence from the effect of force-field minimization suggests that incorporating classical physics elements may be critical for further progress. A plausible implication is that future research in SBDD—particularly with generative neural methods—must combine statistical manifold modeling with classical physical restraints to ensure both speed and generalizability of physically valid outputs.
7. Impact and Future Directions
PoseBusters passing rate has become a key metric for benchmarking SBDD pipelines, given its strict composite filtering of chemical, geometric, and energetic criteria. It enables direct comparison of diverse methods’ outputs, independent of RMSD-related errors and dataset-dependent noise, and exposes limitations of both classical docking software and contemporary deep generative models. As information-geometric and schedule-based methodologies continue to increase PBValid towards “reference-level” rates, the metric will remain central in algorithmic assessment, error analysis, and the identification of robust inductive biases for next-generation drug design platforms (Jin et al., 30 Jan 2026, Qiu et al., 12 May 2025, Buttenschoen et al., 2023).