Papers
Topics
Authors
Recent
Search
2000 character limit reached

Nonconjugate Variational Message Passing

Updated 24 January 2026
  • Nonconjugate variational message passing is a framework that extends traditional variational methods to handle Bayesian models without conjugate relationships.
  • It employs block-coordinate natural-gradient optimization and numerical quadrature to compute intractable expectations, enhancing inference in models like GLMMs and logistic regression.
  • Stochastic implementations use mini-batch sampling and Robbins–Monro updates, enabling scalable and efficient posterior approximation in complex hierarchical models.

Nonconjugate variational message passing (NCVMP) is a class of algorithms extending variational message passing (VMP) to Bayesian models with nonconjugate likelihoods or prior–likelihood pairs. These methods perform approximate posterior inference using exponential-family variational approximations, even when the classical conjugate relationships required for closed-form variational Bayes updates are absent. NCVMP encompasses the key principles of block-coordinate natural-gradient optimization, flexible factor graph representations, and numerical quadrature for handling intractable expected sufficient statistics. This framework has facilitated scalable and accurate variational inference for hierarchical models such as generalized linear mixed models (GLMMs), regression models with non-differentiable losses, and structured prediction architectures.

1. Model Setting and Factor Graph Representation

NCVMP is formulated for Bayesian hierarchical models with potentially nonconjugate losses. Consider observations y=(y1,,yn)y=(y_1,\dots,y_n) modeled by a “pseudo-likelihood” defined via a loss function L(yi,ηi)L(y_i,\eta_i), where the linear predictor is

ηi=xiβ+ziu\eta_i = x_i^\top\beta + z_i^\top u

with fixed effects βRp\beta \in \mathbb{R}^p and random effects u=(u1,,uH)u = (u_1, \dots, u_H), uhRdhu_h \in \mathbb{R}^{d_h}. The loss-driven likelihood takes the form

logπ(yβ,u,σ2)=1ϕi=1nL(yi,ηi)\log\pi(y \mid \beta, u, \sigma^2) = -\frac{1}{\phi} \sum_{i=1}^n L(y_i, \eta_i)

where ϕ>0\phi > 0. LL may be non-differentiable (e.g., quantile, hinge losses). Priors are typically chosen from conditionally conjugate families, such as: βN(0,σβ2Rβ1),uhσh2N(0,σh2Rh1),σh2IG(Ah,Bh),σ2IG(A0,B0)\beta \sim \mathcal{N}(0, \sigma_\beta^2 R_\beta^{-1}),\quad u_h | \sigma_h^2 \sim \mathcal{N}(0, \sigma_h^2 R_h^{-1}),\quad \sigma_h^2 \sim \operatorname{IG}(A_h, B_h),\quad \sigma^2 \sim \operatorname{IG}(A_0, B_0) The joint posterior factors naturally into a product of prior and likelihood terms; this permits construction of the corresponding factor graph, with factor nodes for data likelihoods (L(yi,ηi)L(y_i,\eta_i)0), priors, and hyperpriors, and variable nodes for each model parameter (Castiglione et al., 2022).

2. Variational Family and Objective Function

NCVMP adopts a mean-field product-form exponential-family variational approximation: L(yi,ηi)L(y_i,\eta_i)1 where

L(yi,ηi)L(y_i,\eta_i)2

The goal is to minimize Kullback–Leibler divergence L(yi,ηi)L(y_i,\eta_i)3, equivalent to maximizing the evidence lower bound (ELBO): L(yi,ηi)L(y_i,\eta_i)4 NCVMP performs block-coordinate fixed-point updates of the canonical parameters, shown to correspond to natural-gradient ascent in the space of exponential-family parameters: L(yi,ηi)L(y_i,\eta_i)5 where L(yi,ηi)L(y_i,\eta_i)6 is the Fisher information of L(yi,ηi)L(y_i,\eta_i)7 (Castiglione et al., 2022, Tan et al., 2012).

3. Message Passing and Handling Nonconjugacy

In NCVMP, each factor sends a message to variable L(yi,ηi)L(y_i,\eta_i)8 based on the expected log-factor under the variational distribution of other variables: L(yi,ηi)L(y_i,\eta_i)9 Conjugate factors (Gaussian, Gamma, etc.) yield closed-form messages corresponding to canonical-parameter contributions.

For nonconjugate data factors—such as those arising from non-differentiable or non-analytic loss functions—the message into the Gaussian block (ηi=xiβ+ziu\eta_i = x_i^\top\beta + z_i^\top u0) involves the expectation of the loss under the variational Gaussian marginal for the linear predictor. For each data unit,

ηi=xiβ+ziu\eta_i = x_i^\top\beta + z_i^\top u1

where ηi=xiβ+ziu\eta_i = x_i^\top\beta + z_i^\top u2 under ηi=xiβ+ziu\eta_i = x_i^\top\beta + z_i^\top u3. The required calculations are summarized by “smoothed-loss” integrals: ηi=xiβ+ziu\eta_i = x_i^\top\beta + z_i^\top u4 These yield the expectation, gradient, and Hessian contributions needed for quadratic updates: ηi=xiβ+ziu\eta_i = x_i^\top\beta + z_i^\top u5 If ηi=xiβ+ziu\eta_i = x_i^\top\beta + z_i^\top u6 is non-differentiable, the existence of ηi=xiβ+ziu\eta_i = x_i^\top\beta + z_i^\top u7 is established via expectations of weak derivatives, and closed-form formulae may be available (e.g., quantile, hinge losses) (Castiglione et al., 2022, Tan et al., 2012, Tan et al., 2012). Otherwise, one-dimensional quadrature (Gauss–Hermite, Clenshaw–Curtis) is used.

4. Algorithmic Implementation: Batch and Stochastic NCVMP

The batch NCVMP algorithm iteratively:

  • Computes all ηi=xiβ+ziu\eta_i = x_i^\top\beta + z_i^\top u8 for ηi=xiβ+ziu\eta_i = x_i^\top\beta + z_i^\top u9
  • Updates dispersion and random-effect scale parameters via closed-form
  • Forms prior precision matrices
  • Calculates the natural-gradient Gaussian updates for mean and covariance
  • Optionally recomputes ELBO and checks convergence

Explicit update equations for the block parameters are provided. For the Gaussian block,

βRp\beta \in \mathbb{R}^p0

βRp\beta \in \mathbb{R}^p1

βRp\beta \in \mathbb{R}^p2

Stochastic NCVMP scales to large datasets by sampling mini-batches of size βRp\beta \in \mathbb{R}^p3, estimating βRp\beta \in \mathbb{R}^p4, and replacing sums over βRp\beta \in \mathbb{R}^p5 by scaled sums over the batch. Robbins–Monro step sizes update the canonical parameters. Other blocks (inverse-gamma factors) adopt similar stochastic updates (Castiglione et al., 2022, Tan et al., 2012).

5. Computational Complexity and Practical Considerations

  • Batch NCVMP: Per iteration computational cost is βRp\beta \in \mathbb{R}^p6, dominated by quadrature work βRp\beta \in \mathbb{R}^p7 and matrix calculations.
  • Stochastic NCVMP: Complexity is reduced to βRp\beta \in \mathbb{R}^p8 per update, with memory proportional to batch size.
  • Conjugate VMP avoids quadrature, yielding closed forms but similar dominant cost in matrix operations.
  • Standard black-box variational inference requires Monte Carlo in the block dimension, often more expensive and high variance.

Initialization from penalized quasi-likelihood avoids local suboptimality. Damping or fixing tuning parameters (e.g., partial noncentering matrices) can stabilize convergence. The ELBO serves as both a convergence monitor and a marginal-likelihood proxy for model comparison (Castiglione et al., 2022, Tan et al., 2012).

6. Examples: Quantile, Logistic, and Mixed Models

In quantile regression (βRp\beta \in \mathbb{R}^p9), the loss is

u=(u1,,uH)u = (u_1, \dots, u_H)0

with closed-form u=(u1,,uH)u = (u_1, \dots, u_H)1 integrals: u=(u1,,uH)u = (u_1, \dots, u_H)2

u=(u1,,uH)u = (u_1, \dots, u_H)3

Updates then follow through the general NCVMP equations. For logistic regression, the loss u=(u1,,uH)u = (u_1, \dots, u_H)4 lacks analytic u=(u1,,uH)u = (u_1, \dots, u_H)5; Gauss–Hermite quadrature is applied. GLMMs with partial noncentering leverage tuning matrices u=(u1,,uH)u = (u_1, \dots, u_H)6 chosen by local curvature of the likelihood to adapt model representation and accelerate convergence (Castiglione et al., 2022, Tan et al., 2012).

7. Extensions and Applications

Stochastic NCVMP integrates natural-gradient steps with mini-batch sampling, permitting data-scaling to massive problems. The same message structure underlies diagnostics for prior–likelihood conflict, yielding computationally efficient p-values for Bayesian model checking (Tan et al., 2012). In hierarchical models such as GLMMs, partial noncentering is shown to adaptively yield improved approximation accuracy and accelerated convergence relative to both centered and noncentered schemes (Tan et al., 2012). The approach is broadly applicable to models with non-analytic, non-differentiable, or nonconjugate losses, encompassing settings such as support vector machines and mixed additive models.

Table: Summary of NCVMP Features

Aspect Description Source
Variational family Mean-field, exponential family (Gaussian, IG) (Castiglione et al., 2022)
Nonconjugacy handling 1-D quadrature for expected sufficient stats (Castiglione et al., 2022)
Optimization method Block-coordinate, natural-gradient, fixed-point (Tan et al., 2012)
Stochastic extension Robbins–Monro step size, mini-batch sampling (Tan et al., 2012)
Application domains GLMMs, quantile, logistic, general regression (Castiglione et al., 2022)
Computational cost u=(u1,,uH)u = (u_1, \dots, u_H)7; batch/stochastic variants (Castiglione et al., 2022)

NCVMP thus enables scalable, accurate Bayesian inference in nonconjugate models, with broad applicability and robust computational properties, as supported by empirical studies and theoretical analyses (Castiglione et al., 2022, Tan et al., 2012, Tan et al., 2012).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Nonconjugate Variational Message Passing.