Papers
Topics
Authors
Recent
Search
2000 character limit reached

Theory and computation for structured variational inference

Published 13 Nov 2025 in stat.ML and cs.LG | (2511.09897v1)

Abstract: Structured variational inference constitutes a core methodology in modern statistical applications. Unlike mean-field variational inference, the approximate posterior is assumed to have interdependent structure. We consider the natural setting of star-structured variational inference, where a root variable impacts all the other ones. We prove the first results for existence, uniqueness, and self-consistency of the variational approximation. In turn, we derive quantitative approximation error bounds for the variational approximation to the posterior, extending prior work from the mean-field setting to the star-structured setting. We also develop a gradient-based algorithm with provable guarantees for computing the variational approximation using ideas from optimal transport theory. We explore the implications of our results for Gaussian measures and hierarchical Bayesian models, including generalized linear models with location family priors and spike-and-slab priors with one-dimensional debiasing. As a by-product of our analysis, we develop new stability results for star-separable transport maps which might be of independent interest.

Summary

  • The paper introduces star-structured variational inference (SSVI), a refinement of mean-field methods that incorporates hierarchical dependencies for improved posterior approximations.
  • It establishes existence and uniqueness of the SSVI minimizer with sharp non-asymptotic KL error bounds, leveraging self-consistency equations under strong log-concavity assumptions.
  • The study proves computational tractability via convex reparametrization using transport maps, enabling scalable, gradient-based optimization in high-dimensional Bayesian models.

Theory and Computation for Structured Variational Inference

Introduction and Setting

Structured variational inference (SVI) refines classical mean-field variational inference (MFVI) by incorporating explicit dependencies among latent variables in the variational approximation to the posterior. Whereas MFVI leverages independence for computational tractability, this independence introduces bias when the model's latent structure involves complex dependencies. SVI, and in particular star-structured variational inference (SSVI), models the dependency of leaf latent variables on a root variable, aligning with widespread hierarchical Bayesian and exchangeable models.

This paper provides the first detailed theoretical and computational analysis of SSVI. The authors address fundamental questions: under what conditions does a unique SSVI minimizer exist, what is its approximation quality to the true posterior, and can it be efficiently computed with provable guarantees? The approach is model-agnostic and applies to general log-concave posterior measures.

Existence, Uniqueness, and Self-Consistency of SSVI Minimizers

The SSVI projection solves

π=argminμCstarKL(μπ)\pi^\star = \arg\min_{\mu \in C_{\text{star}}} \mathrm{KL}(\mu \| \pi)

where CstarC_{\text{star}} is the family of star-structured measures:

μ(z1,,zd)=μ1(z1)j=2dμj(zjz1)\mu(z_1,\ldots,z_d) = \mu_1(z_1) \prod_{j=2}^d \mu_j(z_j \mid z_1)

Under strong log-concavity of the target posterior (i.e., the potential VV satisfies 2VαI\nabla^2 V \succeq \alpha I), the SSVI objective admits a unique minimizer. The minimizer factorizes as:

π(z)=p(z1)q(z1z1)\pi^\star(z) = p^\star(z_1)\,q^\star(z_{-1} \mid z_1)

with qq^\star given by a conditional MFVI minimization for each fixed z1z_1.

A key result is a set of coupled self-consistency equations for the root and conditional leaf marginals: \begin{align*} p\star(z_1) &\propto \exp\left(-\int_0{z_1} \int \partial_1 V(s, z_{-1}) q\star(dz_{-1} \mid s) ds\right) \ q\star_i(z_i|z_1) &\propto \exp\left(-\int V(z_1,z_i, z_{-{1,i}}) \prod_{j\ge2, j\ne i} q\star_j(dz_j|z_1)\right) \end{align*} These equations generalize fixed-point equations for MFVI and transfer regularity properties from VV to the SSVI minimizer.

Error Bounds and Regularity Under Star-Graph Assumptions

The authors introduce a "root domination" condition on the mixed Hessian structure of VV:

11Vj=2d(1jV)2LV>0\partial_{11} V - \frac{ \|\sum_{j=2}^d (\partial_{1j} V)^2 \|_{L^\infty} }{ \ell_V } > 0

where V\ell_V is a uniform lower bound on the principal minor of 2V\nabla^2 V excluding the root.

Given this, the SSVI Kullback–Leibler (KL) approximation error admits a sharp non-asymptotic upper bound:

KL(ππ)i2j>iEπ[(ijV)2]\mathrm{KL}(\pi^\star \| \pi) \lesssim \sum_{i\ge2}\sum_{j>i} \mathbb{E}_{\pi^\star}[(\partial_{ij} V)^2]

This bound scales with the magnitude of second-order mixed derivatives among the leaf variables and strictly improves upon analogous MFVI bounds when root–leaf interactions dominate.

Strong log-concavity and smoothness of the SSVI minimizer components are established. Marginals and conditionals are shown to inherit log-concavity and log-smoothness from VV, providing differentiability almost everywhere and facilitating subsequent transport regularity estimates.

Explicit Characterization: Gaussian and GLM Models

Gaussian Models

For Gaussian targets π=N(m,Σ)\pi = N(m, \Sigma), SSVI has analytic solutions:

  • The SSVI marginal for the root equals the true marginal: p=N(m1,σ11)p^\star = N(m_1, \sigma_{11}).
  • The conditional for leaves is Gaussian, with covariance restricted to the diagonal of the Schur complement.
  • The SSVI covariance outperforms MFVI in the KL divergence by a strict log-ratio of variances:

KL(ππ)KL(πˉπ)=12log(σ11(σ11)1)0\mathrm{KL}(\pi^\star \| \pi) - \mathrm{KL}(\bar{\pi} \| \pi) = -\frac{1}{2}\log \left( \frac{\sigma_{11}}{(\sigma^{11})^{-1}} \right ) \le 0

thus showing that SSVI is always at least as good as MFVI in the Gaussian case.

Generalized Linear Models

For Bayesian GLMs with location family hierarchies or spike-and-slab variable selection priors, explicit SSVI error bounds are derived:

  • For GLMs with location priors, KL error scales with squared off-diagonal elements of the design matrix A(β)A(\beta).
  • In Bayesian linear regression, the approximation error is small when AA is diagonally dominant.
  • For spike-and-slab models with debiased priors (following [Castillo et al., 2024]), SSVI yields valid upper bounds whenever interaction and prior curvature parameters satisfy explicit inequalities.

In random Gaussian design matrices, explicit scaling constants are given, and SSVI is shown to succeed with high probability in high-dimensional regimes, contingent upon root-leaf dominance in the sample covariance.

Computational Guarantees via Optimal Transport

A principal computational innovation is to reparametrize the SSVI optimization as a convex problem over star-separable transport maps emanating from a reference measure (typically N(0,I)N(0,I)):

T=argminTTstarKL(T#ρπ)T^\star = \arg\min_{T \in T_{\text{star}}} \mathrm{KL}(T_\# \rho \| \pi)

where TstarT_{\text{star}} comprises KR (Knothe–Rosenblatt) maps of star-graph form.

Convexity of the set of such maps is established, and existence and uniqueness follow from convexity of the lifted objective and the topology induced by the adapted Wasserstein metric. This approach circumvents non-convexity in the space of structured measures and aligns with recent developments in measure–map equivalence under optimal transport [Beiglböck et al., 2023].

The regularity theory developed yields explicit Lipschitz and higher-order derivative bounds for coordinate-wise and mixed derivatives of the optimal map TT^\star:

  • The root coordinate map T1T_1^\star and leaf conditionals Ti(z1)T_i^\star(\cdot | z_1) are bounded in their first and second derivatives.
  • Mixed derivatives are explicitly bounded under strengthened root-domination and log-concavity assumptions.

Scalable Convex Dictionary Approximation and Gradient Algorithm

A piecewise linear dictionary approximation for star-structured transport maps is constructed, generalizing techniques for MFVI [Jia et al., 2025]. The authors show that for any ϵ>0\epsilon > 0, a finite-dimensional convex parameter set suffices to approximate TT^\star to O(ϵ)O(\epsilon) accuracy in L2(ρ)L^2(\rho) norm, with parameter dimension O(d2/ϵ2log(d/ϵ2))O(d^2/\epsilon^2 \log(d/\epsilon^2)).

Projected gradient descent over this convex parameter space is shown to converge exponentially fast to the optimal dictionary approximation, with step size and convergence rate determined by log-smoothness and log-concavity constants. The analysis accommodates the nontrivial geometry of entropy over transport maps and explicitly computes the necessary Lipschitz constants.

Structured VI Beyond Star Graphs

The framework is shown to be extensible in principle to tree-structured variational families via KR maps appropriate to arbitrary dependency graphs. However, for general trees, convexity and regularity properties become technically more challenging; direct computation may be infeasible due to the increase in conditional dependency and depth in the map construction, highlighting the unique computational advantage of star-structured SVI.

Implications and Future Directions

This work rigorously validates structural variational approximations beyond mean-field, both in theory and in practice. It formally demonstrates that introducing appropriate dependencies can significantly improve approximation quality without sacrificing computational tractability, provided the structure is amenable to transport-based convexification. The framework integrates nonparametric optimal transport theory with structural graphical modeling, laying a foundation for scalable computation in hierarchical and partially exchangeable Bayesian models.

For practical inference tasks, such as multilevel regression, variable selection in genomics, and state-space models, the results show that SSVI can be implemented via explicit gradient-based schemes with provable error bounds.

Future questions involve extending transport-based convexification to deeper trees, quantifying gains of increasing structural complexity, and connecting transport regularity to asymptotic and finite-sample risk. Structured VI for time series, partially observed networks, and latent variable models is expected to benefit from these developments. There is also scope for further refinement of regularity theory for transport on unbounded domains and integration with entropic regularization and particle-based VI methods.

Conclusion

The paper establishes existence, uniqueness, regularity, and computational tractability for star-structured variational inference, converging theoretical and algorithmic dimensions. SSVI offers quantifiably improved posterior approximations in models with hierarchical structure and can be solved in polynomial time via convex transport map optimization. This analysis bridges graphical modeling, convex analysis, and optimal transport, advancing both the statistics and machine learning understanding of structured variational methods.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 36 likes about this paper.