Papers
Topics
Authors
Recent
Search
2000 character limit reached

Nonconjugate Variational Message Passing

Updated 10 January 2026
  • Nonconjugate Variational Message Passing is a structured framework enabling approximate Bayesian inference for models with nonconjugate likelihoods.
  • It leverages fixed-point equations, natural gradients, and one-dimensional quadrature to efficiently handle complex likelihoods without closed-form solutions.
  • NCVMP scales to large datasets via stochastic updates and is applied in GLMMs, sparse GP regression, SVMs, and quantile regression.

Nonconjugate Variational Message Passing (NCVMP) is a structured optimization framework for variational inference in Bayesian models with nonconjugate or non-differentiable likelihood factors. By generalizing the classical variational message passing (VMP) approach beyond conjugate exponential families, NCVMP enables scalable, modular, and accurate approximate posterior inference for a broad class of hierarchical models, including generalized linear mixed models (GLMMs), sparse spectrum Gaussian process (GP) regression, support vector machines, and quantile regression. This methodology fundamentally relies on leveraging fixed-point equations for variational exponential family factors, employing natural gradients in the evidence lower bound (ELBO) landscape, and using efficient one-dimensional quadrature to bypass the computational bottlenecks of high-dimensional integrations (Castiglione et al., 2022, Tan et al., 2013, Tan et al., 2012, Tan et al., 2012).

1. Variational Inference for Nonconjugate Models

Variational inference seeks an approximating density q(θ)q(\theta) to the true Bayesian posterior p(θy)p(\theta\mid y), targeting minimization of the Kullback-Leibler divergence, or equivalently maximizing the ELBO:

L(q)=Eq(θ)[logp(y,θ)]Eq(θ)[logq(θ)].\mathcal{L}(q) = \mathbb{E}_{q(\theta)}[\log p(y,\theta)] - \mathbb{E}_{q(\theta)}[\log q(\theta)].

Classical VMP is tractable when each factor in the model's factor graph is conjugate to its associated variable, leading to closed-form coordinate updates. However, in regression models with complex likelihoods—such as non-differentiable loss terms or priors outside the conjugate exponential class—these closed forms are unavailable or intractable. NCVMP circumvents this by employing natural-gradient fixed points for exponential-family variational factors, allowing flexible posterior approximation and supporting models with nonconjugate structure (Castiglione et al., 2022, Tan et al., 2012).

2. Model Specification and Factor-Graph Representation

In NCVMP, the model and inference structure are formalized in terms of a bipartite factor graph. Variable nodes represent latent parameters (e.g., regression coefficients, random effects, variance components), while factor nodes correspond to likelihood and prior contributions. For example, a generalized Bayesian regression model may be specified as follows:

  • Linear predictor: ηi=xiβ+ziu\eta_i = x_i^\top\beta + z_i^\top u, i=1,,ni=1,\dots,n
  • Pseudo-likelihood: π(yθ)exp{iψ0(yi,ηi)/ϕ}\pi(y\mid\theta) \propto \exp\left\{-\sum_i \psi_0(y_i, \eta_i)/\phi\right\}
  • Priors: Gaussian for effects, inverse gamma or half-Cauchy for variances The joint model factorizes as p(θ,y)=p(β)hp(uhσh2)p(σh2)p(σ2)exp{iψ0(yi,ηi)/ϕ}p(\theta, y) = p(\beta)\prod_h p(u_h|\sigma_h^2)p(\sigma_h^2)p(\sigma^2) \exp\{-\sum_i \psi_0(y_i, \eta_i)/\phi\}, with nonconjugacy naturally handled at the message-passing level (Castiglione et al., 2022, Tan et al., 2013).

3. NCVMP Algorithmic Framework

3.1 Variational Factorization and Update Equations

The mean-field factorization q(θ)=lql(θl)q(\theta) = \prod_l q_l(\theta_l) is imposed, with each qlq_l in a chosen exponential family. The NCVMP update for the natural parameters λl\lambda_l of qlq_l is given by

λlVl(λl)1λlEq[logp(y,θ)],\lambda_l \leftarrow \mathcal{V}_l(\lambda_l)^{-1} \nabla_{\lambda_l} \mathbb{E}_q[\log p(y, \theta)],

where Vl\mathcal{V}_l is the Fisher information matrix of qlq_l. For Gaussian variational factors (common in regression and mixed models), this yields updates of the form

m(t+1)=m(t)[H(t)]1g(t),    S(t+1)=[H(t)]1m^{(t+1)} = m^{(t)} - [H^{(t)}]^{-1}g^{(t)}, \;\; S^{(t+1)} = -[H^{(t)}]^{-1}

with explicit formulas for gradients g(t)g^{(t)} and Hessians H(t)H^{(t)} dependent on expectations under qq of derivatives of the loss ψ0\psi_0 (Castiglione et al., 2022, Tan et al., 2012).

3.2 Handling Nonconjugacy

Critical to NCVMP is that the required expectations in the updates involve, for nonconjugate factors, only one-dimensional Gaussian integrals of the form

Ψr,i=rmir  EN(mi,νi2)[ψ0(yi,η)]\Psi_{r,i} = \frac{\partial^r}{\partial m_i^r} \; \mathbb{E}_{N(m_i, \nu_i^2)}[\psi_0(y_i, \eta)]

where r=0,1,2r=0,1,2. These are efficiently evaluated by adaptive Gauss-Hermite quadrature, even for non-differentiable or otherwise intractable likelihoods. This key property enables applicability to models with arbitrary loss functions (Castiglione et al., 2022, Tan et al., 2012).

3.3 Stochastic Variational Updates

To scale NCVMP to large datasets, minibatch variants are implemented: at each iteration, a random subsample is used to estimate the requisite gradients and Hessians, and natural parameters are updated using a Robbins-Monro schedule. This reduces per-iteration complexity from O(nd2+d3)O(n d_*^2 + d_*^3) to O(sd2+d3)O(s d_*^2 + d_*^3) for minibatch size sns \ll n, at the expense of added stochasticity (Castiglione et al., 2022, Tan et al., 2012). Convergence is monitored via the ELBO. Hybrid schedules, where stochastic updates are used initially and then switched to batch updates, accelerate optimization for moderate-size problems (Tan et al., 2012).

4. Application Domains and Model Classes

NCVMP provides an inference engine for a broad spectrum of nonconjugate models:

  • Generalized Linear Mixed Models (GLMMs): Partial and fully noncentered parametrizations in GLMMs substantially accelerate convergence, adaptively interpolating between the regimes where centering or noncentering is statistically and numerically optimal. This framework is compatible with both Poisson and Bernoulli likelihoods, with nonconjugate terms handled by quadrature (Tan et al., 2012, Tan et al., 2012).
  • Sparse Spectrum Gaussian Process Regression: The spectral parameterization of GP covariances yields nonconjugacy with respect to kernel hyperparameters. NCVMP yields closed-form and quadrature-based updates for both local and global variational parameters, and adaptive natural gradient step sizes further accelerate optimization (Tan et al., 2013).
  • Non-differentiable and Nonconjugate Losses: Support vector machines and quantile regression models, which feature non-differentiable losses, are efficiently accommodated, since NCVMP updates do not require smoothness or analytic forms for loss derivatives (Castiglione et al., 2022).

5. Empirical Performance and Complexity

Extensive simulation and real-data studies demonstrate that NCVMP consistently outperforms conjugate mean field variational Bayes (MFVB) and Laplace approximations in terms of the ELBO and accuracy of marginal posteriors (Castiglione et al., 2022). Key findings include:

  • Per-iteration cost for batch NCVMP is nearly identical to MFVB.
  • Wall-clock runtimes are comparable or slightly faster for NCVMP compared to MFVB or Laplace.
  • Stochastic NCVMP is effective for massive datasets, providing favorable trade-offs between speed and approximation accuracy (Tan et al., 2012, Castiglione et al., 2022).
  • Posterior means obtained via NCVMP closely match those of MCMC, though, similar to other VB methods, posterior variances can be underestimated; partial noncentering ameliorates this bias (Tan et al., 2012).

6. Theoretical Properties and Extension Strategies

  • Natural Gradient Structure: The fixed-point equations of NCVMP correspond to natural-gradient ascent in the ELBO, which improves conditioning and may explain empirical acceleration in convergence relative to standard coordinate ascent (Tan et al., 2013).
  • Adaptive Step Sizing: Overrelaxed natural gradient steps, with ELBO-based monitoring, halve convergence time in GP regression and yield similar improvements in other models (Tan et al., 2013).
  • Conflict Diagnostics: The message structure in NCVMP allows principled diagnostics of prior-likelihood conflict as by-products of inference, providing an alternative to model criticism via cross-validatory MCMC (Tan et al., 2012).
  • Model Selection: Since the (optimized) ELBO provides a lower bound to the log marginal likelihood, it can be leveraged for Bayesian model comparison and selection (Tan et al., 2012).

7. Implementation Considerations and Limitations

NCVMP is fully modular: factor-specific updates are algorithmically decoupled, and no model-specific data augmentation or differentiability of the loss is required (Castiglione et al., 2022). However, quadrature steps for each data point in nonconjugate terms induce an O(n)O(n) scaling factor; in high-dimensional settings or for complex likelihoods, computational efficiency relies on exploitation of sparsity or use of stochastic variants. Known limitations, common to VB, include posterior variance underestimation; stronger results for calibration or uncertainty quantification may require alternative approximations or corrections (Tan et al., 2012).

References

  • "Bayesian non-conjugate regression via variational message passing" (Castiglione et al., 2022)
  • "Variational inference for sparse spectrum Gaussian process regression" (Tan et al., 2013)
  • "Variational Inference for Generalized Linear Mixed Models Using Partially Noncentered Parametrizations" (Tan et al., 2012)
  • "A stochastic variational framework for fitting and diagnosing generalized linear mixed models" (Tan et al., 2012)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Nonconjugate Variational Message Passing (NCVMP).