Nonconjugate Variational Message Passing
- Nonconjugate Variational Message Passing is a structured framework enabling approximate Bayesian inference for models with nonconjugate likelihoods.
- It leverages fixed-point equations, natural gradients, and one-dimensional quadrature to efficiently handle complex likelihoods without closed-form solutions.
- NCVMP scales to large datasets via stochastic updates and is applied in GLMMs, sparse GP regression, SVMs, and quantile regression.
Nonconjugate Variational Message Passing (NCVMP) is a structured optimization framework for variational inference in Bayesian models with nonconjugate or non-differentiable likelihood factors. By generalizing the classical variational message passing (VMP) approach beyond conjugate exponential families, NCVMP enables scalable, modular, and accurate approximate posterior inference for a broad class of hierarchical models, including generalized linear mixed models (GLMMs), sparse spectrum Gaussian process (GP) regression, support vector machines, and quantile regression. This methodology fundamentally relies on leveraging fixed-point equations for variational exponential family factors, employing natural gradients in the evidence lower bound (ELBO) landscape, and using efficient one-dimensional quadrature to bypass the computational bottlenecks of high-dimensional integrations (Castiglione et al., 2022, Tan et al., 2013, Tan et al., 2012, Tan et al., 2012).
1. Variational Inference for Nonconjugate Models
Variational inference seeks an approximating density to the true Bayesian posterior , targeting minimization of the Kullback-Leibler divergence, or equivalently maximizing the ELBO:
Classical VMP is tractable when each factor in the model's factor graph is conjugate to its associated variable, leading to closed-form coordinate updates. However, in regression models with complex likelihoods—such as non-differentiable loss terms or priors outside the conjugate exponential class—these closed forms are unavailable or intractable. NCVMP circumvents this by employing natural-gradient fixed points for exponential-family variational factors, allowing flexible posterior approximation and supporting models with nonconjugate structure (Castiglione et al., 2022, Tan et al., 2012).
2. Model Specification and Factor-Graph Representation
In NCVMP, the model and inference structure are formalized in terms of a bipartite factor graph. Variable nodes represent latent parameters (e.g., regression coefficients, random effects, variance components), while factor nodes correspond to likelihood and prior contributions. For example, a generalized Bayesian regression model may be specified as follows:
- Linear predictor: ,
- Pseudo-likelihood:
- Priors: Gaussian for effects, inverse gamma or half-Cauchy for variances The joint model factorizes as , with nonconjugacy naturally handled at the message-passing level (Castiglione et al., 2022, Tan et al., 2013).
3. NCVMP Algorithmic Framework
3.1 Variational Factorization and Update Equations
The mean-field factorization is imposed, with each in a chosen exponential family. The NCVMP update for the natural parameters of is given by
where is the Fisher information matrix of . For Gaussian variational factors (common in regression and mixed models), this yields updates of the form
with explicit formulas for gradients and Hessians dependent on expectations under of derivatives of the loss (Castiglione et al., 2022, Tan et al., 2012).
3.2 Handling Nonconjugacy
Critical to NCVMP is that the required expectations in the updates involve, for nonconjugate factors, only one-dimensional Gaussian integrals of the form
where . These are efficiently evaluated by adaptive Gauss-Hermite quadrature, even for non-differentiable or otherwise intractable likelihoods. This key property enables applicability to models with arbitrary loss functions (Castiglione et al., 2022, Tan et al., 2012).
3.3 Stochastic Variational Updates
To scale NCVMP to large datasets, minibatch variants are implemented: at each iteration, a random subsample is used to estimate the requisite gradients and Hessians, and natural parameters are updated using a Robbins-Monro schedule. This reduces per-iteration complexity from to for minibatch size , at the expense of added stochasticity (Castiglione et al., 2022, Tan et al., 2012). Convergence is monitored via the ELBO. Hybrid schedules, where stochastic updates are used initially and then switched to batch updates, accelerate optimization for moderate-size problems (Tan et al., 2012).
4. Application Domains and Model Classes
NCVMP provides an inference engine for a broad spectrum of nonconjugate models:
- Generalized Linear Mixed Models (GLMMs): Partial and fully noncentered parametrizations in GLMMs substantially accelerate convergence, adaptively interpolating between the regimes where centering or noncentering is statistically and numerically optimal. This framework is compatible with both Poisson and Bernoulli likelihoods, with nonconjugate terms handled by quadrature (Tan et al., 2012, Tan et al., 2012).
- Sparse Spectrum Gaussian Process Regression: The spectral parameterization of GP covariances yields nonconjugacy with respect to kernel hyperparameters. NCVMP yields closed-form and quadrature-based updates for both local and global variational parameters, and adaptive natural gradient step sizes further accelerate optimization (Tan et al., 2013).
- Non-differentiable and Nonconjugate Losses: Support vector machines and quantile regression models, which feature non-differentiable losses, are efficiently accommodated, since NCVMP updates do not require smoothness or analytic forms for loss derivatives (Castiglione et al., 2022).
5. Empirical Performance and Complexity
Extensive simulation and real-data studies demonstrate that NCVMP consistently outperforms conjugate mean field variational Bayes (MFVB) and Laplace approximations in terms of the ELBO and accuracy of marginal posteriors (Castiglione et al., 2022). Key findings include:
- Per-iteration cost for batch NCVMP is nearly identical to MFVB.
- Wall-clock runtimes are comparable or slightly faster for NCVMP compared to MFVB or Laplace.
- Stochastic NCVMP is effective for massive datasets, providing favorable trade-offs between speed and approximation accuracy (Tan et al., 2012, Castiglione et al., 2022).
- Posterior means obtained via NCVMP closely match those of MCMC, though, similar to other VB methods, posterior variances can be underestimated; partial noncentering ameliorates this bias (Tan et al., 2012).
6. Theoretical Properties and Extension Strategies
- Natural Gradient Structure: The fixed-point equations of NCVMP correspond to natural-gradient ascent in the ELBO, which improves conditioning and may explain empirical acceleration in convergence relative to standard coordinate ascent (Tan et al., 2013).
- Adaptive Step Sizing: Overrelaxed natural gradient steps, with ELBO-based monitoring, halve convergence time in GP regression and yield similar improvements in other models (Tan et al., 2013).
- Conflict Diagnostics: The message structure in NCVMP allows principled diagnostics of prior-likelihood conflict as by-products of inference, providing an alternative to model criticism via cross-validatory MCMC (Tan et al., 2012).
- Model Selection: Since the (optimized) ELBO provides a lower bound to the log marginal likelihood, it can be leveraged for Bayesian model comparison and selection (Tan et al., 2012).
7. Implementation Considerations and Limitations
NCVMP is fully modular: factor-specific updates are algorithmically decoupled, and no model-specific data augmentation or differentiability of the loss is required (Castiglione et al., 2022). However, quadrature steps for each data point in nonconjugate terms induce an scaling factor; in high-dimensional settings or for complex likelihoods, computational efficiency relies on exploitation of sparsity or use of stochastic variants. Known limitations, common to VB, include posterior variance underestimation; stronger results for calibration or uncertainty quantification may require alternative approximations or corrections (Tan et al., 2012).
References
- "Bayesian non-conjugate regression via variational message passing" (Castiglione et al., 2022)
- "Variational inference for sparse spectrum Gaussian process regression" (Tan et al., 2013)
- "Variational Inference for Generalized Linear Mixed Models Using Partially Noncentered Parametrizations" (Tan et al., 2012)
- "A stochastic variational framework for fitting and diagnosing generalized linear mixed models" (Tan et al., 2012)