Papers
Topics
Authors
Recent
Search
2000 character limit reached

Variational Inference with Normalizing Flows

Updated 16 December 2025
  • VI-NFs is a methodology that integrates variational inference with deep normalizing flows to construct efficient, adaptive proposals for trans-dimensional Bayesian sampling.
  • The approach employs architectures like RealNVP to transform simple base distributions into complex posteriors, enabling effective model-aware proposal mechanisms.
  • Empirical results indicate that VI-NFs enhances sampler mixing and convergence, yielding 2–5× speedups and near-optimal acceptance rates in RJMCMC applications.

Variational Inference with Normalizing Flows (VI-NFs) is a methodology that integrates modern variational inference (VI) objectives with the expressivity of normalizing flows (NFs) to construct highly efficient proposal mechanisms for trans-dimensional Bayesian sampling frameworks such as Reversible Jump Markov Chain Monte Carlo (RJMCMC). By parameterizing transport maps with deep neural flows and optimizing them via tractable variational criteria, VI-NFs enables adaptive and model-aware cross-model proposals that crucially accelerate exploration and mixing in high-dimensional, multi-model state spaces (Yin et al., 14 Dec 2025).

1. Theoretical Foundation of VI-NFs

VI-NFs builds on the core principle of representing complex posteriors by transforming samples from a tractable base distribution through a bijective, invertible mapping parametrized by a neural network (“normalizing flow”). For model kk with parameter θkRnk\theta_k \in \mathbb{R}^{n_k}, a flow TkT_k pushes forward a simple reference density q0(z)q_0(z) (e.g., isotropic Gaussian) onto an approximation of the target posterior πk(θ)\pi_k(\theta):

qϕ(θ)=q0(Tk1(θ))detTk1(θ)θq_\phi(\theta) = q_0(T_k^{-1}(\theta))\,\left| \det \frac{\partial T_k^{-1}(\theta)}{\partial \theta} \right|

Optimization employs the reverse Kullback–Leibler (KL) divergence:

KL(qϕπ)=Eqϕ[logqϕ(θ)logp(θ,y)]+logp(y)\mathrm{KL}(q_\phi \Vert \pi^*) = \mathbb{E}_{q_\phi} [\log q_\phi(\theta) - \log p(\theta, y)] + \log p(y)

Reverse KL is used specifically because it permits training solely with base samples zq0z \sim q_0, thereby circumventing the need for expensive pilot MCMC samples from the target posterior as in forward-KL-based methods (Yin et al., 14 Dec 2025). The consequence is direct amortized inference over the entire model collection.

2. Flow Architectures for Model-Aware Transport

In VI-NFs, the transport maps TkT_k are implemented as deep normalizing flows, typically using architectures such as RealNVP that employ affine coupling layers with masking and parameter sharing. For input xRDx \in \mathbb{R}^D:

z1:d=x1:d zd+1:D=xd+1:Dexp(sϕ(x1:d))+tϕ(x1:d)\begin{aligned} z_{1:d} &= x_{1:d} \ z_{d+1:D} &= x_{d+1:D} \exp\left( s_\phi(x_{1:d}) \right) + t_\phi(x_{1:d}) \end{aligned}

where sϕ()s_\phi(\cdot) and tϕ()t_\phi(\cdot) are neural networks. Deep composition of such coupling layers allows TkT_k to capture intricate posterior dependencies unique to each model kk. Conditional flows can be employed for amortized inference, where a shared flow is conditioned on the model index, further reducing training cost and facilitating proposal adaptation across the entire model set.

3. VI-NF Proposals in Transport RJMCMC Algorithms

The central utility of VI-NFs in the RJMCMC context is proposal construction for both within-model and cross-model moves:

  • Within-model proposals: Flows TkT_k define transformations to a latent space where standard MCMC (e.g., HMC) updates are performed, followed by inversion to obtain valid proposals in parameter space (“NeuTra”-style). The target density in latent space is

πk(z)π(yTk(z),k)π(Tk(z)k)detTk(z)z\pi_k(z) \propto \pi(y\,|\,T_k(z),k)\, \pi(T_k(z)\,|\,k) \left| \det \frac{\partial T_k(z)}{\partial z} \right|

  • Between-model proposals: For a move kkk \rightarrow k', the latent variable zk=Tk1(θk)z_k = T_k^{-1}(\theta_k) is augmented (or truncated) with auxiliary reference samples, mapped through a dimension-matching diffeomorphism hk,kh_{k,k'}, and finally pushed forward by TkT_{k'}:
  1. zk=Tk1(θk)z_k = T_k^{-1}(\theta_k)
  2. zk=hk,k(zk,u)z_{k'} = h_{k,k'}(z_k, u), uνu \sim \nu (if nk>nkn_{k'} > n_k)
  3. θk=Tk(zk)\theta_{k'}' = T_{k'}(z_{k'})
  4. Accept (k,θk)(k', \theta_{k'}') with RJMCMC probability including flow and mapping Jacobians.

This construction provides non-linear, model-aware transport; exact mappings can produce near-unit acceptance, and approximate flows yield substantive efficiency improvements over naive proposals.

4. Marginal Likelihood Estimation and Model Comparison

Training flows via reverse KL not only delivers efficient proposals, but also supplies accurate estimates of the marginal likelihood:

p(y)=Eqϕ[p(θ,y)qϕ(θ)]1Ni=1Np(θi,y)qϕ(θi),θiqϕp(y) = \mathbb{E}_{q_\phi} \left[ \frac{p(\theta, y)}{q_\phi(\theta)} \right] \approx \frac{1}{N} \sum_{i=1}^N \frac{p(\theta_i, y)}{q_\phi(\theta_i)}, \quad \theta_i \sim q_\phi

This importance-sampling marginal-likelihood estimator is critical for Bayesian model comparison, enabling computation of Bayes factors and model probabilities from the trained flow approximations. Empirical results demonstrate that VI-NFs deliver tight and unbiased estimates in realistic variable-selection and factor-analysis scenarios (Yin et al., 14 Dec 2025).

5. Empirical Evidence for Improved Mixing and Efficiency

Benchmark experiments on synthetic (“sinh–arcsinh”), factor analysis, and linear regression variable selection tasks demonstrate that RJMCMC samplers employing VI-NF transport maps achieve markedly faster mixing both within and across models compared to independence samplers, pilot-MCMC transport methods, and forward-KL trained flows. Effective sample size per CPU-second and convergence diagnostics indicate 2–5× speedups, frequently with cross-model acceptance rates approaching unity when flows are sufficiently expressive (Yin et al., 14 Dec 2025).

Conditional flow variants further reduce computational burden and provide competitive performance by amortizing inference across model space.

6. Algorithmic Structure and Implementation Considerations

The VI-NF RJMCMC workflow is summarized as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def VI_NF_RJ_proposal(x=(k, theta_k), flows, model_proposal, ref_base):
    z_k = flows[k].inverse(theta_k)
    k_prime = sample(model_proposal(k))
    if n_k_prime > n_k:
        u = sample(ref_base, shape=(n_k_prime - n_k))
        z_k_prime = h_k_kprime(z_k, u)
    else:
        z_k_prime = truncate(z_k)
    theta_k_prime = flows[k_prime].forward(z_k_prime)
    # Compute acceptance ratio including Jacobians
    alpha = compute_alpha(...)
    if uniform(0,1) < alpha:
        return (k_prime, theta_k_prime)
    else:
        return (k, theta_k)

Training of flows is conducted via reverse KL, using only samples from the base reference density, with architecture selection (e.g., RealNVP layers) and depth controlled for sufficient expressivity.

7. Relationship to Broader Transdimensional Proposal Methodology

VI-NFs represents a significant advance in the theory and practice of proposal design for trans-dimensional samplers. By learning cross-model nonlinear transport directly from the base density—rather than relying on pilot MCMC output or asymptotic moment-based proposals—VI-NFs integrates adaptive, model-specific information into RJMCMC updates. This distinguishes it from earlier informed RJMCMC (Laplace/moment proposals) (Gagnon, 2019), kD-tree interpolation (Farr et al., 2011), transport flows trained with forward KL (Davies et al., 2022), and generic multi-model samplers. VI-NFs decouples proposal construction from target sampling, which is especially advantageous in challenging or costly inference settings. Conditional flows provide further adaptability and can be scaled to large model collections.

8. Practical Significance and Limitations

VI-NFs requires neural flow training but eliminates the need for pilot Markov chains in every model, reducing total inference cost in regimes with expensive likelihood evaluation or where the number of models is large. The marginal likelihood and Bayes factor estimates produced by VI-NFs retain accuracy and stability critical for model selection and comparison tasks (Yin et al., 14 Dec 2025). Limitations include the computational burden of training deep flows for very high-dimensional models, and the requirement that the flows be sufficiently expressive to match the posterior geometry; underfitting can degrade acceptance rates.

In summary, VI-NFs provides an explicit protocol for leveraging deep normalizing flows in variational inference to construct efficient, tractable proposal mechanisms for trans-dimensional sampling algorithms such as RJMCMC, with demonstrably superior mixing and efficiency properties compared to previous state-of-the-art transport-based and heuristic proposal methodologies.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Variational Inference with Normalizing Flows (VI-NFs).