PoseBusters Passing Rate

Updated 2 February 2026

PoseBusters Passing Rate is a metric that calculates the fraction of protein–ligand poses meeting 20 stringent chemical, geometric, and energetic criteria.
It aggregates binary pass/fail results from each test, with state-of-the-art generative models achieving rates up to 95.9%, highlighting performance differences across methods.
The metric is essential for structure-based drug design, guiding improvements in docking algorithms by revealing the need for integrating classical physics with deep-learning innovations.

The PoseBusters passing rate is a quantitative metric that evaluates the proportion of predicted protein–ligand poses that conform to a stringent suite of 20 geometric, chemical, and energetic quality-control checks. It is principally used to assess the physical plausibility and chemical feasibility of molecular docking outputs, especially in the context of deep generative models and structure-based drug design (SBDD). Unlike traditional RMSD-based criteria, PoseBusters passing rate (designated PBValid or PR) directly measures whether a pose is chemically valid, geometrically sound, and free of low-level steric and energetic pathologies as defined by established computational chemistry toolkits such as RDKit.

1. Definition and Mathematical Formulation

The PoseBusters passing rate is defined for a set of $N$ candidate docked poses $\{M_i\}$ as the fraction of poses passing all specified quality checks. Each generated pose $M$ is assigned a binary label: $\mathrm{Pass}(M) = \begin{cases} 1 & \text{if } M \text{ passes all PoseBusters checks} \ 0 & \text{otherwise} \end{cases}$ The aggregate passing rate is: $\mathrm{PBValid} = \frac{1}{N}\sum_{i=1}^N \mathrm{Pass}(M_i) \times 100\%$ This definition applies regardless of method or dataset. PoseBusters tests are distributed across three main categories:

Chemical validity and consistency: Encompasses sanitization, InChI-convertibility, and molecular formula/bonding checks.
Intramolecular geometric and energetic correctness: Includes bond length and angle bounds ( $0.75\, d_\text{LB}(i,j) \leq d_{ij} \leq 1.25\, d_\text{UB}(i,j)$ ), ring planarity ( $\max_i\, \mathrm{dist}_\text{plane}(i) \leq 0.25$ Å), internal steric clashes, and strain energy ( $E_{\rm pred}/\bar{E} \leq 100$ ).
Intermolecular plausibility: Minimum protein–ligand/cofactor/water atom distances, and pocket volume overlap constraints ( $V_\text{overlap}/V_\text{lig} < 7.5\%$ ).

For specific recent benchmarks, further submetrics have been introduced (e.g., PB-Valid-Mol for ligand-only checks and PB-Valid-Dock for protein–ligand interface checks), and for extensive SBDD tasks PBValid is applied to thousands of predicted poses per test set (Buttenschoen et al., 2023, Qiu et al., 12 May 2025, Jin et al., 30 Jan 2026).

2. PoseBusters Check Pipeline and Criteria

PoseBusters employs a sequential pipeline, automating 20 discrete quality-control filters:

A. Chemical Validity:

RDKit molecular sanitization: passes only if atom valencies, aromaticity, hybridization are chemically plausible.
InChI comparison: exact match to reference connectivity, hydrogens, tetrahedral centers, and double-bond E/Z stereochemistry.

B. Geometric and Energetic Checks:

Bond distances and angles: each must lie within ±25% ideal bounds.
Aromatic ring and C=C planarity: deviation $<$ 0.25 Å.
Internal non-bonded atom pairs: minimum separation per scaled distance geometry.
Strain energy of docked pose compared to conformational ensemble, ratio capped at 100.

C. Intermolecular Constraints:

Minimum separation (weighted van der Waals/covalent radii) from protein, cofactors, waters.
Pocket overlap: ligand volume overlapping with protein and cofactors $<$ 7.5%.

All criteria are strictly binary; a pose failing any check is discarded, and only entirely "PB-valid" poses contribute to PR.

3. Benchmark Datasets and Evaluation Protocols

The metric is typically applied on standardized protein–ligand benchmarks:

Astex Diverse (85 cases): Classical and deep-learning docking methods compared head-to-head.
CrossDocked2020 and PoseBusters Bench (hundreds to $>10^4$ cases): Data filtered for crystal poses RMSD $<$ 1 Å, with train/test splits controlled for sequence identity ( $\leq$ 30%) to eliminate leakage (Jin et al., 30 Jan 2026, Qiu et al., 12 May 2025).
Held-out OOD sets: Complexes not seen in training, used to measure generalization.

Evaluation workflow involves: (1) de novo ligand generation (100 candidates per protein target is typical), (2) rigid docking to predicted or reference pockets (frequently with AutoDock Vina), (3) batch PB-valid checks, and (4) computation and comparison of passing rates against benchmarks and prior methods.

4. Comparative Performance Across Methods

PoseBusters passing rates vary widely across algorithmic paradigms:

Method	PBValid (%)
AR (Luo et al.)	59.0
Pocket2Mol	72.3
TargetDiff	50.5
DecompDiff	71.7
MolCRAFT	84.6
EvoEGF-Mol	93.4
MolPilot (VOS)	95.9
AutoDock Vina	51.0 (PB-Bench)
CCDC Gold	48.1 (PB-Bench)
DiffDock	14.0 (PB-Bench)
DeepDock	6.8 (PB-Bench)

State-of-the-art generative SBDD models based on information geometry (EvoEGF-Mol), Bayesian flow networks (MolPilot), and optimal schedule design have pushed PBValid to reference or even super-reference levels (93–96%), dramatically exceeding both classical docking tools and earlier deep-learning baselines (Jin et al., 30 Jan 2026, Qiu et al., 12 May 2025, Buttenschoen et al., 2023). A plausible implication is that these geometric and probabilistic advances are addressing modality mismatch and training instability in previous protocols. Conversely, deep-learning methods lacking robust inductive bias frequently report sub-20% on novel benchmarks unless post-docking force-field minimization is applied, which can "rescue" many error modes.

5. Methodological Advances Underlying High Passing Rates

Establishment of high PBValid rates in recent models relies on explicit technical innovations:

Unified Exponential-family modeling: Jointly modeling atomic coordinates and chemical types as a single composite exponential-family distribution under Fisher–Rao metric prevents geometric–chemical mismatch, supporting co-refinement of 3D structure and 2D identity (Jin et al., 30 Jan 2026).
Evolving Exponential Geodesic Flow (EvoEGF): Dynamically concentrating endpoint distributions avoid trajectory collapse and maintain stable variance/support at each generation time step, resulting in more realistic samples and smooth training (Jin et al., 30 Jan 2026).
VLB-Optimal Scheduling (VOS): Interpreting the Variational Lower Bound as a line integral in noise-schedule space, and optimizing the denoising path jointly across continuous and discrete variables, yields superior molecular and interaction fidelity (Qiu et al., 12 May 2025).
Post-docking minimization: Application of classical force fields (AMBER/Sage via OpenMM) to deep-learning outputs substantially increases PBValid (e.g., DiffDock from 14% to ≈50%), indicating current models do not fully encode all relevant physical constraints (Buttenschoen et al., 2023).

These advances reduce artifacts such as twisted geometries, bond strain, and steric clashes, yielding low Jensen–Shannon divergences in bond/angle/torsion distributions, strain energies comparable to crystal structures, and improved fingerprint similarity for critical protein–ligand interactions.

6. Interpretations, Limitations, and Implications

High PoseBusters passing rates are correlated with chemically plausible, strain-free, and physically sensible candidate ligands. Recent models demonstrate almost perfect conformance on in-distribution CrossDock benchmarks, and maintain robust passing rates ( $\sim$ 80%) on challenging out-of-distribution held-out tests.

However, the metric is fundamentally binary and does not measure degree-of-error for near-failing poses, nor does it subsume criteria such as bioactivity, affinity, or specific pharmacophore matching. All-or-nothing filtering may thus discard marginally non-ideal candidates that remain viable after minimal energetic relaxation. Empirical evidence from the effect of force-field minimization suggests that incorporating classical physics elements may be critical for further progress. A plausible implication is that future research in SBDD—particularly with generative neural methods—must combine statistical manifold modeling with classical physical restraints to ensure both speed and generalizability of physically valid outputs.

7. Impact and Future Directions

PoseBusters passing rate has become a key metric for benchmarking SBDD pipelines, given its strict composite filtering of chemical, geometric, and energetic criteria. It enables direct comparison of diverse methods’ outputs, independent of RMSD-related errors and dataset-dependent noise, and exposes limitations of both classical docking software and contemporary deep generative models. As information-geometric and schedule-based methodologies continue to increase PBValid towards “reference-level” rates, the metric will remain central in algorithmic assessment, error analysis, and the identification of robust inductive biases for next-generation drug design platforms (Jin et al., 30 Jan 2026, Qiu et al., 12 May 2025, Buttenschoen et al., 2023).

Markdown Report Issue Upgrade to Chat

References (3)

PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences (2023)

Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule (2025)

EvoEGF-Mol: Evolving Exponential Geodesic Flow for Structure-based Drug Design (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PoseBusters Passing Rate.

PoseBusters Passing Rate

1. Definition and Mathematical Formulation

2. PoseBusters Check Pipeline and Criteria

3. Benchmark Datasets and Evaluation Protocols

4. Comparative Performance Across Methods

5. Methodological Advances Underlying High Passing Rates

6. Interpretations, Limitations, and Implications

7. Impact and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

PoseBusters Passing Rate

1. Definition and Mathematical Formulation

2. PoseBusters Check Pipeline and Criteria

3. Benchmark Datasets and Evaluation Protocols

4. Comparative Performance Across Methods

5. Methodological Advances Underlying High Passing Rates

6. Interpretations, Limitations, and Implications

7. Impact and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research