Nonconjugate Variational Message Passing

Updated 10 January 2026

Nonconjugate Variational Message Passing is a structured framework enabling approximate Bayesian inference for models with nonconjugate likelihoods.
It leverages fixed-point equations, natural gradients, and one-dimensional quadrature to efficiently handle complex likelihoods without closed-form solutions.
NCVMP scales to large datasets via stochastic updates and is applied in GLMMs, sparse GP regression, SVMs, and quantile regression.

Nonconjugate Variational Message Passing (NCVMP) is a structured optimization framework for variational inference in Bayesian models with nonconjugate or non-differentiable likelihood factors. By generalizing the classical variational message passing (VMP) approach beyond conjugate exponential families, NCVMP enables scalable, modular, and accurate approximate posterior inference for a broad class of hierarchical models, including generalized linear mixed models (GLMMs), sparse spectrum Gaussian process (GP) regression, support vector machines, and quantile regression. This methodology fundamentally relies on leveraging fixed-point equations for variational exponential family factors, employing natural gradients in the evidence lower bound (ELBO) landscape, and using efficient one-dimensional quadrature to bypass the computational bottlenecks of high-dimensional integrations (Castiglione et al., 2022, Tan et al., 2013, Tan et al., 2012, Tan et al., 2012).

1. Variational Inference for Nonconjugate Models

Variational inference seeks an approximating density $q(\theta)$ to the true Bayesian posterior $p(\theta\mid y)$ , targeting minimization of the Kullback-Leibler divergence, or equivalently maximizing the ELBO:

$\mathcal{L}(q) = \mathbb{E}_{q(\theta)}[\log p(y,\theta)] - \mathbb{E}_{q(\theta)}[\log q(\theta)].$

Classical VMP is tractable when each factor in the model's factor graph is conjugate to its associated variable, leading to closed-form coordinate updates. However, in regression models with complex likelihoods—such as non-differentiable loss terms or priors outside the conjugate exponential class—these closed forms are unavailable or intractable. NCVMP circumvents this by employing natural-gradient fixed points for exponential-family variational factors, allowing flexible posterior approximation and supporting models with nonconjugate structure (Castiglione et al., 2022, Tan et al., 2012).

2. Model Specification and Factor-Graph Representation

In NCVMP, the model and inference structure are formalized in terms of a bipartite factor graph. Variable nodes represent latent parameters (e.g., regression coefficients, random effects, variance components), while factor nodes correspond to likelihood and prior contributions. For example, a generalized Bayesian regression model may be specified as follows:

Linear predictor: $\eta_i = x_i^\top\beta + z_i^\top u$ , $i=1,\dots,n$
Pseudo-likelihood: $\pi(y\mid\theta) \propto \exp\left\{-\sum_i \psi_0(y_i, \eta_i)/\phi\right\}$
Priors: Gaussian for effects, inverse gamma or half-Cauchy for variances The joint model factorizes as $p(\theta, y) = p(\beta)\prod_h p(u_h|\sigma_h^2)p(\sigma_h^2)p(\sigma^2) \exp\{-\sum_i \psi_0(y_i, \eta_i)/\phi\}$ , with nonconjugacy naturally handled at the message-passing level (Castiglione et al., 2022, Tan et al., 2013).

3. NCVMP Algorithmic Framework

3.1 Variational Factorization and Update Equations

The mean-field factorization $q(\theta) = \prod_l q_l(\theta_l)$ is imposed, with each $q_l$ in a chosen exponential family. The NCVMP update for the natural parameters $\lambda_l$ of $q_l$ is given by

$\lambda_l \leftarrow \mathcal{V}_l(\lambda_l)^{-1} \nabla_{\lambda_l} \mathbb{E}_q[\log p(y, \theta)],$

where $\mathcal{V}_l$ is the Fisher information matrix of $q_l$ . For Gaussian variational factors (common in regression and mixed models), this yields updates of the form

$m^{(t+1)} = m^{(t)} - [H^{(t)}]^{-1}g^{(t)}, \;\; S^{(t+1)} = -[H^{(t)}]^{-1}$

with explicit formulas for gradients $g^{(t)}$ and Hessians $H^{(t)}$ dependent on expectations under $q$ of derivatives of the loss $\psi_0$ (Castiglione et al., 2022, Tan et al., 2012).

3.2 Handling Nonconjugacy

Critical to NCVMP is that the required expectations in the updates involve, for nonconjugate factors, only one-dimensional Gaussian integrals of the form

$\Psi_{r,i} = \frac{\partial^r}{\partial m_i^r} \; \mathbb{E}_{N(m_i, \nu_i^2)}[\psi_0(y_i, \eta)]$

where $r=0,1,2$ . These are efficiently evaluated by adaptive Gauss-Hermite quadrature, even for non-differentiable or otherwise intractable likelihoods. This key property enables applicability to models with arbitrary loss functions (Castiglione et al., 2022, Tan et al., 2012).

3.3 Stochastic Variational Updates

To scale NCVMP to large datasets, minibatch variants are implemented: at each iteration, a random subsample is used to estimate the requisite gradients and Hessians, and natural parameters are updated using a Robbins-Monro schedule. This reduces per-iteration complexity from $O(n d_*^2 + d_*^3)$ to $O(s d_*^2 + d_*^3)$ for minibatch size $s \ll n$ , at the expense of added stochasticity (Castiglione et al., 2022, Tan et al., 2012). Convergence is monitored via the ELBO. Hybrid schedules, where stochastic updates are used initially and then switched to batch updates, accelerate optimization for moderate-size problems (Tan et al., 2012).

4. Application Domains and Model Classes

NCVMP provides an inference engine for a broad spectrum of nonconjugate models:

Generalized Linear Mixed Models (GLMMs): Partial and fully noncentered parametrizations in GLMMs substantially accelerate convergence, adaptively interpolating between the regimes where centering or noncentering is statistically and numerically optimal. This framework is compatible with both Poisson and Bernoulli likelihoods, with nonconjugate terms handled by quadrature (Tan et al., 2012, Tan et al., 2012).
Sparse Spectrum Gaussian Process Regression: The spectral parameterization of GP covariances yields nonconjugacy with respect to kernel hyperparameters. NCVMP yields closed-form and quadrature-based updates for both local and global variational parameters, and adaptive natural gradient step sizes further accelerate optimization (Tan et al., 2013).
Non-differentiable and Nonconjugate Losses: Support vector machines and quantile regression models, which feature non-differentiable losses, are efficiently accommodated, since NCVMP updates do not require smoothness or analytic forms for loss derivatives (Castiglione et al., 2022).

5. Empirical Performance and Complexity

Extensive simulation and real-data studies demonstrate that NCVMP consistently outperforms conjugate mean field variational Bayes (MFVB) and Laplace approximations in terms of the ELBO and accuracy of marginal posteriors (Castiglione et al., 2022). Key findings include:

Per-iteration cost for batch NCVMP is nearly identical to MFVB.
Wall-clock runtimes are comparable or slightly faster for NCVMP compared to MFVB or Laplace.
Stochastic NCVMP is effective for massive datasets, providing favorable trade-offs between speed and approximation accuracy (Tan et al., 2012, Castiglione et al., 2022).
Posterior means obtained via NCVMP closely match those of MCMC, though, similar to other VB methods, posterior variances can be underestimated; partial noncentering ameliorates this bias (Tan et al., 2012).

6. Theoretical Properties and Extension Strategies

Natural Gradient Structure: The fixed-point equations of NCVMP correspond to natural-gradient ascent in the ELBO, which improves conditioning and may explain empirical acceleration in convergence relative to standard coordinate ascent (Tan et al., 2013).
Adaptive Step Sizing: Overrelaxed natural gradient steps, with ELBO-based monitoring, halve convergence time in GP regression and yield similar improvements in other models (Tan et al., 2013).
Conflict Diagnostics: The message structure in NCVMP allows principled diagnostics of prior-likelihood conflict as by-products of inference, providing an alternative to model criticism via cross-validatory MCMC (Tan et al., 2012).
Model Selection: Since the (optimized) ELBO provides a lower bound to the log marginal likelihood, it can be leveraged for Bayesian model comparison and selection (Tan et al., 2012).

7. Implementation Considerations and Limitations

NCVMP is fully modular: factor-specific updates are algorithmically decoupled, and no model-specific data augmentation or differentiability of the loss is required (Castiglione et al., 2022). However, quadrature steps for each data point in nonconjugate terms induce an $O(n)$ scaling factor; in high-dimensional settings or for complex likelihoods, computational efficiency relies on exploitation of sparsity or use of stochastic variants. Known limitations, common to VB, include posterior variance underestimation; stronger results for calibration or uncertainty quantification may require alternative approximations or corrections (Tan et al., 2012).

References

"Bayesian non-conjugate regression via variational message passing" (Castiglione et al., 2022)
"Variational inference for sparse spectrum Gaussian process regression" (Tan et al., 2013)
"Variational Inference for Generalized Linear Mixed Models Using Partially Noncentered Parametrizations" (Tan et al., 2012)
"A stochastic variational framework for fitting and diagnosing generalized linear mixed models" (Tan et al., 2012)

Markdown Report Issue Upgrade to Chat

References (4)

Bayesian non-conjugate regression via variational message passing (2022)

Variational inference for sparse spectrum Gaussian process regression (2013)

Variational Inference for Generalized Linear Mixed Models Using Partially Noncentered Parametrizations (2012)

A stochastic variational framework for fitting and diagnosing generalized linear mixed models (2012)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Nonconjugate Variational Message Passing (NCVMP).

Nonconjugate Variational Message Passing

1. Variational Inference for Nonconjugate Models

2. Model Specification and Factor-Graph Representation

3. NCVMP Algorithmic Framework

3.1 Variational Factorization and Update Equations

3.2 Handling Nonconjugacy

3.3 Stochastic Variational Updates

4. Application Domains and Model Classes

5. Empirical Performance and Complexity

6. Theoretical Properties and Extension Strategies

7. Implementation Considerations and Limitations

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Nonconjugate Variational Message Passing

1. Variational Inference for Nonconjugate Models

2. Model Specification and Factor-Graph Representation

3. NCVMP Algorithmic Framework

3.1 Variational Factorization and Update Equations

3.2 Handling Nonconjugacy

3.3 Stochastic Variational Updates

4. Application Domains and Model Classes

5. Empirical Performance and Complexity

6. Theoretical Properties and Extension Strategies

7. Implementation Considerations and Limitations

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research