Nonconjugate Variational Message Passing

Updated 24 January 2026

Nonconjugate variational message passing is a framework that extends traditional variational methods to handle Bayesian models without conjugate relationships.
It employs block-coordinate natural-gradient optimization and numerical quadrature to compute intractable expectations, enhancing inference in models like GLMMs and logistic regression.
Stochastic implementations use mini-batch sampling and Robbins–Monro updates, enabling scalable and efficient posterior approximation in complex hierarchical models.

Nonconjugate variational message passing (NCVMP) is a class of algorithms extending variational message passing (VMP) to Bayesian models with nonconjugate likelihoods or prior–likelihood pairs. These methods perform approximate posterior inference using exponential-family variational approximations, even when the classical conjugate relationships required for closed-form variational Bayes updates are absent. NCVMP encompasses the key principles of block-coordinate natural-gradient optimization, flexible factor graph representations, and numerical quadrature for handling intractable expected sufficient statistics. This framework has facilitated scalable and accurate variational inference for hierarchical models such as generalized linear mixed models (GLMMs), regression models with non-differentiable losses, and structured prediction architectures.

1. Model Setting and Factor Graph Representation

NCVMP is formulated for Bayesian hierarchical models with potentially nonconjugate losses. Consider observations $y=(y_1,\dots,y_n)$ modeled by a “pseudo-likelihood” defined via a loss function $L(y_i,\eta_i)$ , where the linear predictor is

$\eta_i = x_i^\top\beta + z_i^\top u$

with fixed effects $\beta \in \mathbb{R}^p$ and random effects $u = (u_1, \dots, u_H)$ , $u_h \in \mathbb{R}^{d_h}$ . The loss-driven likelihood takes the form

$\log\pi(y \mid \beta, u, \sigma^2) = -\frac{1}{\phi} \sum_{i=1}^n L(y_i, \eta_i)$

where $\phi > 0$ . $L$ may be non-differentiable (e.g., quantile, hinge losses). Priors are typically chosen from conditionally conjugate families, such as: $\beta \sim \mathcal{N}(0, \sigma_\beta^2 R_\beta^{-1}),\quad u_h | \sigma_h^2 \sim \mathcal{N}(0, \sigma_h^2 R_h^{-1}),\quad \sigma_h^2 \sim \operatorname{IG}(A_h, B_h),\quad \sigma^2 \sim \operatorname{IG}(A_0, B_0)$ The joint posterior factors naturally into a product of prior and likelihood terms; this permits construction of the corresponding factor graph, with factor nodes for data likelihoods ( $L(y_i,\eta_i)$ 0), priors, and hyperpriors, and variable nodes for each model parameter (Castiglione et al., 2022).

2. Variational Family and Objective Function

NCVMP adopts a mean-field product-form exponential-family variational approximation: $L(y_i,\eta_i)$ 1 where

$L(y_i,\eta_i)$ 2

The goal is to minimize Kullback–Leibler divergence $L(y_i,\eta_i)$ 3, equivalent to maximizing the evidence lower bound (ELBO): $L(y_i,\eta_i)$ 4 NCVMP performs block-coordinate fixed-point updates of the canonical parameters, shown to correspond to natural-gradient ascent in the space of exponential-family parameters: $L(y_i,\eta_i)$ 5 where $L(y_i,\eta_i)$ 6 is the Fisher information of $L(y_i,\eta_i)$ 7 (Castiglione et al., 2022, Tan et al., 2012).

3. Message Passing and Handling Nonconjugacy

In NCVMP, each factor sends a message to variable $L(y_i,\eta_i)$ 8 based on the expected log-factor under the variational distribution of other variables: $L(y_i,\eta_i)$ 9 Conjugate factors (Gaussian, Gamma, etc.) yield closed-form messages corresponding to canonical-parameter contributions.

For nonconjugate data factors—such as those arising from non-differentiable or non-analytic loss functions—the message into the Gaussian block ( $\eta_i = x_i^\top\beta + z_i^\top u$ 0) involves the expectation of the loss under the variational Gaussian marginal for the linear predictor. For each data unit,

$\eta_i = x_i^\top\beta + z_i^\top u$ 1

where $\eta_i = x_i^\top\beta + z_i^\top u$ 2 under $\eta_i = x_i^\top\beta + z_i^\top u$ 3. The required calculations are summarized by “smoothed-loss” integrals: $\eta_i = x_i^\top\beta + z_i^\top u$ 4 These yield the expectation, gradient, and Hessian contributions needed for quadratic updates: $\eta_i = x_i^\top\beta + z_i^\top u$ 5 If $\eta_i = x_i^\top\beta + z_i^\top u$ 6 is non-differentiable, the existence of $\eta_i = x_i^\top\beta + z_i^\top u$ 7 is established via expectations of weak derivatives, and closed-form formulae may be available (e.g., quantile, hinge losses) (Castiglione et al., 2022, Tan et al., 2012, Tan et al., 2012). Otherwise, one-dimensional quadrature (Gauss–Hermite, Clenshaw–Curtis) is used.

4. Algorithmic Implementation: Batch and Stochastic NCVMP

The batch NCVMP algorithm iteratively:

Computes all $\eta_i = x_i^\top\beta + z_i^\top u$ 8 for $\eta_i = x_i^\top\beta + z_i^\top u$ 9
Updates dispersion and random-effect scale parameters via closed-form
Forms prior precision matrices
Calculates the natural-gradient Gaussian updates for mean and covariance
Optionally recomputes ELBO and checks convergence

Explicit update equations for the block parameters are provided. For the Gaussian block,

$\beta \in \mathbb{R}^p$ 0

$\beta \in \mathbb{R}^p$ 1

$\beta \in \mathbb{R}^p$ 2

Stochastic NCVMP scales to large datasets by sampling mini-batches of size $\beta \in \mathbb{R}^p$ 3, estimating $\beta \in \mathbb{R}^p$ 4, and replacing sums over $\beta \in \mathbb{R}^p$ 5 by scaled sums over the batch. Robbins–Monro step sizes update the canonical parameters. Other blocks (inverse-gamma factors) adopt similar stochastic updates (Castiglione et al., 2022, Tan et al., 2012).

5. Computational Complexity and Practical Considerations

Batch NCVMP: Per iteration computational cost is $\beta \in \mathbb{R}^p$ 6, dominated by quadrature work $\beta \in \mathbb{R}^p$ 7 and matrix calculations.
Stochastic NCVMP: Complexity is reduced to $\beta \in \mathbb{R}^p$ 8 per update, with memory proportional to batch size.
Conjugate VMP avoids quadrature, yielding closed forms but similar dominant cost in matrix operations.
Standard black-box variational inference requires Monte Carlo in the block dimension, often more expensive and high variance.

Initialization from penalized quasi-likelihood avoids local suboptimality. Damping or fixing tuning parameters (e.g., partial noncentering matrices) can stabilize convergence. The ELBO serves as both a convergence monitor and a marginal-likelihood proxy for model comparison (Castiglione et al., 2022, Tan et al., 2012).

6. Examples: Quantile, Logistic, and Mixed Models

In quantile regression ( $\beta \in \mathbb{R}^p$ 9), the loss is

$u = (u_1, \dots, u_H)$ 0

with closed-form $u = (u_1, \dots, u_H)$ 1 integrals: $u = (u_1, \dots, u_H)$ 2

$u = (u_1, \dots, u_H)$ 3

Updates then follow through the general NCVMP equations. For logistic regression, the loss $u = (u_1, \dots, u_H)$ 4 lacks analytic $u = (u_1, \dots, u_H)$ 5; Gauss–Hermite quadrature is applied. GLMMs with partial noncentering leverage tuning matrices $u = (u_1, \dots, u_H)$ 6 chosen by local curvature of the likelihood to adapt model representation and accelerate convergence (Castiglione et al., 2022, Tan et al., 2012).

7. Extensions and Applications

Stochastic NCVMP integrates natural-gradient steps with mini-batch sampling, permitting data-scaling to massive problems. The same message structure underlies diagnostics for prior–likelihood conflict, yielding computationally efficient p-values for Bayesian model checking (Tan et al., 2012). In hierarchical models such as GLMMs, partial noncentering is shown to adaptively yield improved approximation accuracy and accelerated convergence relative to both centered and noncentered schemes (Tan et al., 2012). The approach is broadly applicable to models with non-analytic, non-differentiable, or nonconjugate losses, encompassing settings such as support vector machines and mixed additive models.

Table: Summary of NCVMP Features

Aspect	Description	Source
Variational family	Mean-field, exponential family (Gaussian, IG)	(Castiglione et al., 2022)
Nonconjugacy handling	1-D quadrature for expected sufficient stats	(Castiglione et al., 2022)
Optimization method	Block-coordinate, natural-gradient, fixed-point	(Tan et al., 2012)
Stochastic extension	Robbins–Monro step size, mini-batch sampling	(Tan et al., 2012)
Application domains	GLMMs, quantile, logistic, general regression	(Castiglione et al., 2022)
Computational cost	$u = (u_1, \dots, u_H)$ 7; batch/stochastic variants	(Castiglione et al., 2022)

NCVMP thus enables scalable, accurate Bayesian inference in nonconjugate models, with broad applicability and robust computational properties, as supported by empirical studies and theoretical analyses (Castiglione et al., 2022, Tan et al., 2012, Tan et al., 2012).

Markdown Report Issue Upgrade to Chat

References (3)

Bayesian non-conjugate regression via variational message passing (2022)

A stochastic variational framework for fitting and diagnosing generalized linear mixed models (2012)

Variational Inference for Generalized Linear Mixed Models Using Partially Noncentered Parametrizations (2012)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Nonconjugate Variational Message Passing.