Bidirectional Monte Carlo (BDMC)
- Bidirectional Monte Carlo (BDMC) is a technique that uses forward and reverse AIS to generate stochastic sandwich bounds on the log marginal likelihood.
- It provides a reliable diagnostic by bounding the divergence between approximate and true posterior distributions through the computed forward-reverse gap.
- BDMC underpins practical methods like the BREAD protocol, which validates inference quality on both simulated and real data for robust model calibration.
Bidirectional Monte Carlo (BDMC) is a technique for obtaining accurate stochastic upper and lower bounds on the marginal likelihood (log-ML) or, more generally, for bounding the divergence between approximate and true posterior distributions in probabilistic models. BDMC operates by running annealed importance sampling (AIS) or sequential Monte Carlo (SMC) both in the forward (prior to posterior) and reverse (posterior to prior) directions, capitalizing on the reversibility of these algorithms when initialized with exact samples from the target distribution. This framework yields sandwich bounds on log marginal likelihood, facilitates precise model evidence estimation, and underpins key diagnostics for Markov chain Monte Carlo (MCMC) inference—especially within the context of simulated data where exact posterior samples are available (Grosse et al., 2016, Grosse et al., 2015).
1. Foundations: Annealed Importance Sampling and Marginal Likelihood Estimation
Marginal likelihood estimation is central to Bayesian model selection but typically involves intractable integrals over latent variables and parameters. Annealed importance sampling (AIS) addresses this challenge by defining a sequence of unnormalized densities , each associated with a distribution . The sequence interpolates from an initial distribution (, e.g., the prior with known ) to the posterior () (Grosse et al., 2015). The process involves reversible MCMC kernels to transition between successive distributions.
An AIS run produces a nonnegative unbiased estimate of the partition function ratio and an approximate sample from the terminal distribution. The estimator has key properties:
- , thus is a stochastic lower bound on .
- The marginal distribution of the final AIS sample, , does not generally coincide with due to potential lack of kernel mixing (Grosse et al., 2016).
2. The BDMC Method: Forward and Reverse AIS
BDMC extends AIS by running the chain in both directions—forward (prior to posterior) and reverse (posterior to prior)—exploiting the fact that, with an exact sample , one can run the same sequence backward to obtain unbiased importance weights for (Grosse et al., 2015).
Pseudocode outlining both passes:
| Step | Forward AIS | Reverse AIS |
|---|---|---|
| Initialization | (exact) | |
| Weight update | ||
| State transition | ||
| Output | (lower bound) | (upper bound) |
With these, BDMC provides:
- (stochastic lower bound)
- (stochastic upper bound)
- As and kernels mix, (Grosse et al., 2015, Grosse et al., 2016).
3. Divergence Bounding and Theoretical Guarantees
The gap serves not only as a diagnostic for marginal likelihood estimation but also upper-bounds, in expectation, the symmetrized Kullback–Leibler (Jeffreys) divergence between the distribution of AIS samples and the true posterior . Formally,
Thus, is an upper bound on the Jeffreys divergence, making BDMC a tool not only for model evidence calibration but also posterior quality diagnostics (Grosse et al., 2016).
4. The BREAD Protocol for Real-Data Validation
Exact posterior samples are available on simulated data but not on real datasets. The BREAD (Bounding Divergences with REverse Annealing) protocol addresses this by:
- Running the chosen inference method (e.g., Stan, WebPPL) on real data to estimate hyperparameters .
- Simulating synthetic data .
- Performing BDMC on to compute and .
- Comparing convergence curves for versus across . If these are similar, one can trust as a proxy for the true Jeffreys divergence on real data.
- In hierarchical models, a brief MCMC warm-up initialized at may approximate posterior draws for the reverse AIS pass; empirically, a small number of steps suffice (Grosse et al., 2016).
5. Implementation in Probabilistic Programming Systems
BDMC and BREAD require several system-level features:
- Tempering hooks: Evaluating unnormalized log-densities at arbitrary inverse temperatures . Stan required a power-posterior API; WebPPL involved trace-based evaluation with a exponent on likelihood terms.
- Reversible kernels: Choice of Metropolis–Hastings (WebPPL) or Hamiltonian Monte Carlo/No-U-Turn Sampler (HMC/NUTS in Stan), ensuring correct invariance per .
- AIS weight tracking: Inference engine modifications to return and allow reverse-path execution.
- Orchestration scripts: Automating the BREAD protocol—hyperparameter estimation, synthetic dataset simulation, BDMC runs, and curve comparison (Grosse et al., 2016).
Empirical studies demonstrated that the representation of latent-variable models (collapsed vs. uncollapsed) can have significant effects on convergence rates and computational cost, motivating BDMC/BREAD as practical design tools.
6. Empirical Performance and Comparative Findings
Application of BDMC to small-dimension latent-variable models (mixtures, matrix factorization, binary-attribute linear-Gaussian) showed the following:
- The gap can routinely be driven below 1 nat, yielding near-oracle accuracy for log marginal likelihood.
- The upper bound on the Jeffreys divergence is within 10–30% of the exact value, even with poor mixing.
- Standard estimators (BIC, simple SIS, harmonic mean estimator, VB) were often inaccurate compared to BDMC, with only AIS, SMC (single-particle), and Nested Sampling achieving 10 nat RMSE efficiently (Grosse et al., 2015).
- BDMC identified implementation bugs (e.g., in WebPPL's multivariate-Gaussian sampler) by detecting reversals of the expected inequality, underscoring its diagnostic value (Grosse et al., 2016).
7. Recommendations and Practical Guidelines
Effective use of BDMC requires:
- Always simulating a small synthetic dataset from the model to obtain exact posterior samples for reverse AIS.
- Monitoring and driving the forward–reverse gap below a practical threshold (typically nat).
- Favoring sigmoidal annealing schedules for to allocate more annealing steps near the prior and posterior endpoints:
- Using the ground-truth log marginal likelihoods from BDMC to benchmark new estimators by mean squared error.
- Recognizing that also serves as a posterior quality diagnostic, upper-bounding the KL divergence (Grosse et al., 2015, Grosse et al., 2016).
BDMC thus provides both a method for accurately sandwiching marginal likelihood calculations and an empirical tool for measuring and improving posterior inference quality, particularly in the context of MCMC-based or probabilistic programming workflows.