Nonconjugate Variational Message Passing
- Nonconjugate variational message passing is a framework that extends traditional variational methods to handle Bayesian models without conjugate relationships.
- It employs block-coordinate natural-gradient optimization and numerical quadrature to compute intractable expectations, enhancing inference in models like GLMMs and logistic regression.
- Stochastic implementations use mini-batch sampling and Robbins–Monro updates, enabling scalable and efficient posterior approximation in complex hierarchical models.
Nonconjugate variational message passing (NCVMP) is a class of algorithms extending variational message passing (VMP) to Bayesian models with nonconjugate likelihoods or prior–likelihood pairs. These methods perform approximate posterior inference using exponential-family variational approximations, even when the classical conjugate relationships required for closed-form variational Bayes updates are absent. NCVMP encompasses the key principles of block-coordinate natural-gradient optimization, flexible factor graph representations, and numerical quadrature for handling intractable expected sufficient statistics. This framework has facilitated scalable and accurate variational inference for hierarchical models such as generalized linear mixed models (GLMMs), regression models with non-differentiable losses, and structured prediction architectures.
1. Model Setting and Factor Graph Representation
NCVMP is formulated for Bayesian hierarchical models with potentially nonconjugate losses. Consider observations modeled by a “pseudo-likelihood” defined via a loss function , where the linear predictor is
with fixed effects and random effects , . The loss-driven likelihood takes the form
where . may be non-differentiable (e.g., quantile, hinge losses). Priors are typically chosen from conditionally conjugate families, such as: The joint posterior factors naturally into a product of prior and likelihood terms; this permits construction of the corresponding factor graph, with factor nodes for data likelihoods (0), priors, and hyperpriors, and variable nodes for each model parameter (Castiglione et al., 2022).
2. Variational Family and Objective Function
NCVMP adopts a mean-field product-form exponential-family variational approximation: 1 where
2
The goal is to minimize Kullback–Leibler divergence 3, equivalent to maximizing the evidence lower bound (ELBO): 4 NCVMP performs block-coordinate fixed-point updates of the canonical parameters, shown to correspond to natural-gradient ascent in the space of exponential-family parameters: 5 where 6 is the Fisher information of 7 (Castiglione et al., 2022, Tan et al., 2012).
3. Message Passing and Handling Nonconjugacy
In NCVMP, each factor sends a message to variable 8 based on the expected log-factor under the variational distribution of other variables: 9 Conjugate factors (Gaussian, Gamma, etc.) yield closed-form messages corresponding to canonical-parameter contributions.
For nonconjugate data factors—such as those arising from non-differentiable or non-analytic loss functions—the message into the Gaussian block (0) involves the expectation of the loss under the variational Gaussian marginal for the linear predictor. For each data unit,
1
where 2 under 3. The required calculations are summarized by “smoothed-loss” integrals: 4 These yield the expectation, gradient, and Hessian contributions needed for quadratic updates: 5 If 6 is non-differentiable, the existence of 7 is established via expectations of weak derivatives, and closed-form formulae may be available (e.g., quantile, hinge losses) (Castiglione et al., 2022, Tan et al., 2012, Tan et al., 2012). Otherwise, one-dimensional quadrature (Gauss–Hermite, Clenshaw–Curtis) is used.
4. Algorithmic Implementation: Batch and Stochastic NCVMP
The batch NCVMP algorithm iteratively:
- Computes all 8 for 9
- Updates dispersion and random-effect scale parameters via closed-form
- Forms prior precision matrices
- Calculates the natural-gradient Gaussian updates for mean and covariance
- Optionally recomputes ELBO and checks convergence
Explicit update equations for the block parameters are provided. For the Gaussian block,
0
1
2
Stochastic NCVMP scales to large datasets by sampling mini-batches of size 3, estimating 4, and replacing sums over 5 by scaled sums over the batch. Robbins–Monro step sizes update the canonical parameters. Other blocks (inverse-gamma factors) adopt similar stochastic updates (Castiglione et al., 2022, Tan et al., 2012).
5. Computational Complexity and Practical Considerations
- Batch NCVMP: Per iteration computational cost is 6, dominated by quadrature work 7 and matrix calculations.
- Stochastic NCVMP: Complexity is reduced to 8 per update, with memory proportional to batch size.
- Conjugate VMP avoids quadrature, yielding closed forms but similar dominant cost in matrix operations.
- Standard black-box variational inference requires Monte Carlo in the block dimension, often more expensive and high variance.
Initialization from penalized quasi-likelihood avoids local suboptimality. Damping or fixing tuning parameters (e.g., partial noncentering matrices) can stabilize convergence. The ELBO serves as both a convergence monitor and a marginal-likelihood proxy for model comparison (Castiglione et al., 2022, Tan et al., 2012).
6. Examples: Quantile, Logistic, and Mixed Models
In quantile regression (9), the loss is
0
with closed-form 1 integrals: 2
3
Updates then follow through the general NCVMP equations. For logistic regression, the loss 4 lacks analytic 5; Gauss–Hermite quadrature is applied. GLMMs with partial noncentering leverage tuning matrices 6 chosen by local curvature of the likelihood to adapt model representation and accelerate convergence (Castiglione et al., 2022, Tan et al., 2012).
7. Extensions and Applications
Stochastic NCVMP integrates natural-gradient steps with mini-batch sampling, permitting data-scaling to massive problems. The same message structure underlies diagnostics for prior–likelihood conflict, yielding computationally efficient p-values for Bayesian model checking (Tan et al., 2012). In hierarchical models such as GLMMs, partial noncentering is shown to adaptively yield improved approximation accuracy and accelerated convergence relative to both centered and noncentered schemes (Tan et al., 2012). The approach is broadly applicable to models with non-analytic, non-differentiable, or nonconjugate losses, encompassing settings such as support vector machines and mixed additive models.
Table: Summary of NCVMP Features
| Aspect | Description | Source |
|---|---|---|
| Variational family | Mean-field, exponential family (Gaussian, IG) | (Castiglione et al., 2022) |
| Nonconjugacy handling | 1-D quadrature for expected sufficient stats | (Castiglione et al., 2022) |
| Optimization method | Block-coordinate, natural-gradient, fixed-point | (Tan et al., 2012) |
| Stochastic extension | Robbins–Monro step size, mini-batch sampling | (Tan et al., 2012) |
| Application domains | GLMMs, quantile, logistic, general regression | (Castiglione et al., 2022) |
| Computational cost | 7; batch/stochastic variants | (Castiglione et al., 2022) |
NCVMP thus enables scalable, accurate Bayesian inference in nonconjugate models, with broad applicability and robust computational properties, as supported by empirical studies and theoretical analyses (Castiglione et al., 2022, Tan et al., 2012, Tan et al., 2012).