Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bidirectional Monte Carlo (BDMC)

Updated 20 January 2026
  • Bidirectional Monte Carlo (BDMC) is a technique that uses forward and reverse AIS to generate stochastic sandwich bounds on the log marginal likelihood.
  • It provides a reliable diagnostic by bounding the divergence between approximate and true posterior distributions through the computed forward-reverse gap.
  • BDMC underpins practical methods like the BREAD protocol, which validates inference quality on both simulated and real data for robust model calibration.

Bidirectional Monte Carlo (BDMC) is a technique for obtaining accurate stochastic upper and lower bounds on the marginal likelihood (log-ML) or, more generally, for bounding the divergence between approximate and true posterior distributions in probabilistic models. BDMC operates by running annealed importance sampling (AIS) or sequential Monte Carlo (SMC) both in the forward (prior to posterior) and reverse (posterior to prior) directions, capitalizing on the reversibility of these algorithms when initialized with exact samples from the target distribution. This framework yields sandwich bounds on log marginal likelihood, facilitates precise model evidence estimation, and underpins key diagnostics for Markov chain Monte Carlo (MCMC) inference—especially within the context of simulated data where exact posterior samples are available (Grosse et al., 2016, Grosse et al., 2015).

1. Foundations: Annealed Importance Sampling and Marginal Likelihood Estimation

Marginal likelihood estimation is central to Bayesian model selection but typically involves intractable integrals over latent variables and parameters. Annealed importance sampling (AIS) addresses this challenge by defining a sequence of unnormalized densities {ft(x)}t=1T\{f_t(x)\}_{t=1}^T, each associated with a distribution pt(x)=ft(x)/Ztp_t(x) = f_t(x)/Z_t. The sequence interpolates from an initial distribution (p1p_1, e.g., the prior with known Z1Z_1) to the posterior (pTp_T) (Grosse et al., 2015). The process involves reversible MCMC kernels Tt()T_t(\cdot|\cdot) to transition between successive distributions.

An AIS run produces a nonnegative unbiased estimate W^\hat W of the partition function ratio R=ZT/Z1R = Z_T/Z_1 and an approximate sample from the terminal distribution. The estimator has key properties:

  • E[W^]=R\mathbb{E}[\hat W] = R, thus logW^\log \hat W is a stochastic lower bound on logR\log R.
  • The marginal distribution of the final AIS sample, q(xT)q(x_T), does not generally coincide with pTp_T due to potential lack of kernel mixing (Grosse et al., 2016).

2. The BDMC Method: Forward and Reverse AIS

BDMC extends AIS by running the chain in both directions—forward (prior to posterior) and reverse (posterior to prior)—exploiting the fact that, with an exact sample xTpTx_T \sim p_T, one can run the same sequence backward to obtain unbiased importance weights for Z1/ZTZ_1/Z_T (Grosse et al., 2015).

Pseudocode outlining both passes:

Step Forward AIS Reverse AIS
Initialization x1p1x_1 \sim p_1 xTpTx_T \sim p_T (exact)
Weight update ww[ft(xt1)/ft1(xt1)]w \leftarrow w \cdot [f_t(x_{t-1})/f_{t-1}(x_{t-1})] ww[ft1(xt)/ft(xt)]w \leftarrow w \cdot [f_{t-1}(x_t)/f_t(x_t)]
State transition xtTt(xt1)x_t \sim T_t(\cdot|x_{t-1}) xt1Tt(xt)x_{t-1} \sim T_t(\cdot|x_t)
Output L=logwFL = \log w_F (lower bound) U=logwRU = -\log w_R (upper bound)

With these, BDMC provides:

  • LlogRL \leq \log R (stochastic lower bound)
  • UlogRU \geq \log R (stochastic upper bound)
  • As TT \rightarrow \infty and kernels mix, LlogRUL \rightarrow \log R \leftarrow U (Grosse et al., 2015, Grosse et al., 2016).

3. Divergence Bounding and Theoretical Guarantees

The gap Δ=UL\Delta = U - L serves not only as a diagnostic for marginal likelihood estimation but also upper-bounds, in expectation, the symmetrized Kullback–Leibler (Jeffreys) divergence between the distribution of AIS samples q(xT)q(x_T) and the true posterior pT(x)p_T(x). Formally,

E[UL]DKL(qpT)+DKL(pTq)J(q,pT)\mathbb{E}[U - L] \geq D_{\mathrm{KL}}(q||p_T) + D_{\mathrm{KL}}(p_T||q) \equiv J(q, p_T)

Thus, BE[Δ]B \equiv \mathbb{E}[\Delta] is an upper bound on the Jeffreys divergence, making BDMC a tool not only for model evidence calibration but also posterior quality diagnostics (Grosse et al., 2016).

4. The BREAD Protocol for Real-Data Validation

Exact posterior samples are available on simulated data but not on real datasets. The BREAD (Bounding Divergences with REverse Annealing) protocol addresses this by:

  1. Running the chosen inference method (e.g., Stan, WebPPL) on real data to estimate hyperparameters θ^\hat\theta.
  2. Simulating synthetic data D~p(Dθ^)\tilde{D} \sim p(D | \hat\theta).
  3. Performing BDMC on D~\tilde{D} to compute (Lsynth,Usynth)(L_{\mathrm{synth}}, U_{\mathrm{synth}}) and Δsynth\Delta_{\mathrm{synth}}.
  4. Comparing convergence curves for Lreal(T)L_{\mathrm{real}}(T) versus Lsynth(T)L_{\mathrm{synth}}(T) across TT. If these are similar, one can trust Δsynth\Delta_{\mathrm{synth}} as a proxy for the true Jeffreys divergence on real data.
  5. In hierarchical models, a brief MCMC warm-up initialized at θ^\hat\theta may approximate posterior draws for the reverse AIS pass; empirically, a small number of steps suffice (Grosse et al., 2016).

5. Implementation in Probabilistic Programming Systems

BDMC and BREAD require several system-level features:

  • Tempering hooks: Evaluating unnormalized log-densities at arbitrary inverse temperatures βt\beta_t. Stan required a power-posterior API; WebPPL involved trace-based evaluation with a β\beta exponent on likelihood terms.
  • Reversible kernels: Choice of Metropolis–Hastings (WebPPL) or Hamiltonian Monte Carlo/No-U-Turn Sampler (HMC/NUTS in Stan), ensuring correct invariance per ptp_t.
  • AIS weight tracking: Inference engine modifications to return LL and allow reverse-path execution.
  • Orchestration scripts: Automating the BREAD protocol—hyperparameter estimation, synthetic dataset simulation, BDMC runs, and curve comparison (Grosse et al., 2016).

Empirical studies demonstrated that the representation of latent-variable models (collapsed vs. uncollapsed) can have significant effects on convergence rates and computational cost, motivating BDMC/BREAD as practical design tools.

6. Empirical Performance and Comparative Findings

Application of BDMC to small-dimension latent-variable models (mixtures, matrix factorization, binary-attribute linear-Gaussian) showed the following:

  • The gap logUlogL\log U - \log L can routinely be driven below 1 nat, yielding near-oracle accuracy for log marginal likelihood.
  • The upper bound BB on the Jeffreys divergence is within 10–30% of the exact value, even with poor mixing.
  • Standard estimators (BIC, simple SIS, harmonic mean estimator, VB) were often inaccurate compared to BDMC, with only AIS, SMC (single-particle), and Nested Sampling achieving \sim10 nat RMSE efficiently (Grosse et al., 2015).
  • BDMC identified implementation bugs (e.g., in WebPPL's multivariate-Gaussian sampler) by detecting reversals of the expected U<LU<L inequality, underscoring its diagnostic value (Grosse et al., 2016).

7. Recommendations and Practical Guidelines

Effective use of BDMC requires:

  • Always simulating a small synthetic dataset from the model to obtain exact posterior samples for reverse AIS.
  • Monitoring and driving the forward–reverse gap Δ\Delta below a practical threshold (typically 1\lesssim1 nat).
  • Favoring sigmoidal annealing schedules for βt\beta_t to allocate more annealing steps near the prior and posterior endpoints:

βt=σ(δ(2t/T1))σ(δ)σ(δ)σ(δ),σ(u)=1/(1+eu)\beta_t = \frac{\sigma(\delta(2t/T-1))-\sigma(-\delta)}{\sigma(\delta)-\sigma(-\delta)},\quad \sigma(u)=1/(1+e^{-u})

  • Using the ground-truth log marginal likelihoods from BDMC to benchmark new estimators by mean squared error.
  • Recognizing that Δ\Delta also serves as a posterior quality diagnostic, upper-bounding the KL divergence (Grosse et al., 2015, Grosse et al., 2016).

BDMC thus provides both a method for accurately sandwiching marginal likelihood calculations and an empirical tool for measuring and improving posterior inference quality, particularly in the context of MCMC-based or probabilistic programming workflows.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bidirectional Monte Carlo (BDMC).