Papers
Topics
Authors
Recent
Search
2000 character limit reached

Variance-Reduced SGHMC

Updated 18 January 2026
  • Variance-reduced SGHMC is a class of methods that integrates stochastic gradients with techniques like SVRG and SAGA to lower variance in Bayesian sampling.
  • These methods improve convergence rates and offer tighter theoretical guarantees by reducing the variance inherent in minibatch approximations.
  • Empirical studies demonstrate faster convergence and robust performance in tasks such as regression and neural network inference compared to standard SGHMC.

Variance-reduced Stochastic Gradient Hamiltonian Monte Carlo (VR-SGHMC) refers to a class of methods that enhance the efficiency and scalability of Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) by incorporating variance reduction schemes, primarily those developed within the stochastic optimization literature. These algorithms enable efficient Bayesian inference for large datasets by addressing the inherent variance in stochastic gradient approximations, thereby improving both theoretical convergence guarantees and empirical performance in high-dimensional or nonconvex Bayesian models (Zou et al., 2018, Li et al., 2018, Chen et al., 2017, Hu et al., 2021).

1. Theoretical Background and Motivation

SGHMC algorithms are discretizations of underdamped Langevin dynamics or Hamiltonian systems used for posterior sampling. Conventional HMC employs the exact gradient of the negative log-posterior f(θ)\nabla f(\theta), which becomes infeasible for large-scale problems. SGHMC replaces the full gradient with a stochastic estimator ~\widetilde \nabla, generally computed over minibatches. However, naive stochastic gradients exhibit high variance, resulting in slow mixing, poor approximation to the posterior, and degraded empirical risk.

Variance-reduction methods, such as SVRG and SAGA, originally developed for stochastic optimization, construct control variates by leveraging either a full-gradient snapshot or a table of component-wise gradients. In the SGHMC context, these methods produce unbiased or biased gradient estimates with guaranteed lower variance, directly improving the quality of the Markov chain both in theory and practice (Zou et al., 2018, Li et al., 2018, Chen et al., 2017, Hu et al., 2021).

2. Variance-Reduction Schemes in SGHMC

Variance-reduced SGHMC methods typically employ one of four estimators:

Estimator Bias Mechanism
SVRG Unbiased Full-gradient snapshot, epoch-wise
SAGA Unbiased Per-component table, per-iteration
SARAH Biased Recursive single-pass estimator
SARGE Biased Per-component recursive memory
  • SVRG: Maintains a reference (snapshot) point x~\tilde{x} at which the full gradient f(x~)\nabla f(\tilde{x}) is computed. Within each epoch, stochastic gradients are formed as

~=f(x~)+NbiBk[fi(xk)fi(x~)],\widetilde\nabla = \nabla f(\tilde{x}) + \frac{N}{b} \sum_{i \in B_k} [\nabla f_i(x_k) - \nabla f_i(\tilde{x})],

where BkB_k is a minibatch of size bb (Hu et al., 2021, Li et al., 2018, Zou et al., 2018).

  • SAGA: Maintains a table of most recent component gradients ϕi\phi^i and, at each step, updates entries for a random batch while forming

~=(NbiBk[fi(xk)ϕi])+i=1Nϕi.\widetilde\nabla = \left(\frac{N}{b}\sum_{i \in B_k} [\nabla f_i(x_k) - \phi^i]\right) + \sum_{i=1}^N \phi^i.

(Hu et al., 2021, Li et al., 2018).

  • SARAH/SARGE: Deploy recursive estimators with single-pass or table-based memory. They are biased in general but can exhibit smaller mean-squared-error (MSE) in certain regimes (Hu et al., 2021).

These estimators are seamlessly embedded in the SGHMC integrator, replacing the naive minibatch gradient.

3. Algorithmic Frameworks

The core dynamics for variance-reduced SGHMC are based on the discretized underdamped Langevin SDE:

dθ=pdt, dp=f(θ)dtDpdt+2DdWt,\begin{aligned} d\theta &= p\,dt, \ dp &= -\nabla f(\theta)\,dt - Dp\,dt + \sqrt{2D}\,dW_t, \end{aligned}

where DD is the friction parameter and WtW_t is a dd-dimensional Wiener process. Naive Euler discretization with a variance-reduced gradient ~\widetilde\nabla at each step leads to updates of the form:

pt+1=(1Dh)pth~t+2Dhξt, θt+1=θt+hpt+1.\begin{aligned} p_{t+1} &= (1-Dh)p_t - h\,\widetilde\nabla_t + \sqrt{2Dh}\,\xi_t, \ \theta_{t+1} &= \theta_t + h\,p_{t+1}. \end{aligned}

ξt\xi_t is a standard normal vector.

Higher-order splitting schemes, notably symmetric splitting, further decrease discretization bias, as established in (Li et al., 2018). In symmetric splitting, each iteration is decomposed as:

θ(1)=θ+h2p, p(1)=eDh/2p, p(2)=p(1)h~(θ(1))+2Dhξ, p+=eDh/2p(2), θ+=θ(1)+h2p+,\begin{aligned} \theta^{(1)} &= \theta + \frac{h}{2}p, \ p^{(1)} &= e^{-Dh/2}p, \ p^{(2)} &= p^{(1)} - h\,\widetilde\nabla(\theta^{(1)}) + \sqrt{2Dh}\,\xi, \ p^+ &= e^{-Dh/2}p^{(2)}, \ \theta^+ &= \theta^{(1)} + \frac{h}{2}p^+, \end{aligned}

which improves the bias from O(h2)O(h^2) to O(h4)O(h^4) in the MSE bound (Li et al., 2018).

4. Convergence Theory and Gradient Complexity

Central to the analysis of VR-SGHMC are precise non-asymptotic bounds on mean-square error (MSE) and 2-Wasserstein distance to the target posterior. The bias and variance terms are governed by the properties of the gradient estimator—particularly through the Mean-Squared-Error-Bias (MSEB) property (Hu et al., 2021).

  • Unbiased estimators (SVRG, SAGA): Achieve, for strongly log-concave posteriors, gradient complexity of

O~(N+κ2d1/2ε1+κ4/3d1/3N2/3ε2/3)\widetilde{O}(N + \kappa^{2}d^{1/2}\varepsilon^{-1} + \kappa^{4/3} d^{1/3} N^{2/3} \varepsilon^{-2/3})

to reach ε\varepsilon-accuracy in 2-Wasserstein distance, substantially improving over standard SGHMC and full-gradient HMC in practical regimes (Zou et al., 2018, Hu et al., 2021).

  • Biased estimators (SARAH, SARGE): Reduce the NN-dependence in gradient complexity, achieving

O~(N+Nκ2d1/2ε1)\widetilde{O}(N + \sqrt{N} \kappa^2 d^{1/2} \varepsilon^{-1})

but with a weaker ε1\varepsilon^{-1} dependency (Hu et al., 2021).

The table summarizes regime differences:

Method Gradient Complexity Bias Best Regime
SVRG-HMC, SAGA O~(N+κ2d1/2ε1+N2/3κ4/3d1/3ε2/3)\widetilde{O}(N + \kappa^2 d^{1/2} \varepsilon^{-1} + N^{2/3} \kappa^{4/3} d^{1/3} \varepsilon^{-2/3}) Unbiased High-precision, moderate NN
SARAH, SARGE O~(N+Nκ2d1/2ε1)\widetilde{O}(N + \sqrt{N} \kappa^2 d^{1/2} \varepsilon^{-1}) Biased Moderate-accuracy, large NN

The bounds depend explicitly on the smoothness LL, strong convexity mm, minibatch size bb, and tuning (snapshot interval pp) (Zou et al., 2018, Hu et al., 2021).

5. Hyperparameter Selection and Practical Aspects

Empirical and theoretical recommendations for hyperparameters are as follows:

  • Minibatch size (bb): Modest, e.g., b=10b=10 is typical for SVRG/SAGA schemes (Li et al., 2018).
  • Snapshot interval (pp, mm): For SVRG, p=N/bp=N/b, typically m=10m=10, n1n2n_1 \gg n_2 for two-batch control variates (Chen et al., 2017).
  • Step size (hh): Symmetric splitting allows larger hh; Dh<1D h < 1 must be satisfied (Li et al., 2018).
  • Friction (DD, γ\gamma): 1D101 \leq D \leq 10; larger DD leads to more rapid velocity dissipation.
  • Control variate batch (n1n_1) and online batch (n2n_2): n2=10n_2 = 10–$100$, n1=10n2n_1=10 \, n_2, with update every m=10m=10 iterations (Chen et al., 2017).

Pragmatic implementation replaces the SGHMC gradient with the control-variate estimate; memory and computation scale with the size of auxiliary tables for SAGA-type methods.

6. Empirical Performance and Applications

Experimental results demonstrate uniform acceleration of convergence and reduction in estimator variance for VR-SGHMC methods compared to standard SGHMC and Langevin approaches. In large-scale Bayesian regression, classification, and neural network inference tasks, SVRG2nd-HMC and its SAGA/HMC variants exhibit the fastest convergence in both training and test metrics. Key metrics include test mean-squared error (MSE), test negative log-likelihood, and root-MSE for Bayesian neural networks.

  • On the UCI “concrete” dataset: after 5 data-passes,
    • SGHMC test MSE 6.3\approx 6.3
    • SVRG-HMC test MSE 5.2\approx 5.2
    • SVRG2nd-HMC test MSE 4.8\approx 4.8
  • On the “protein” dataset (BNN, after 2 passes):
    • SGHMC test RMSE 5.8\approx 5.8
    • SVRG2nd-HMC test RMSE 4.9\approx 4.9 (Li et al., 2018).

Speedups of 2×2\times10×10\times in convergence are observed across Bayesian regression, classification, deep neural networks (MLP, CNN, ResNet), and LLMs (Chen et al., 2017, Li et al., 2018). Variance reduction also yields smoother learning curves and more robust out-of-sample performance.

7. Comparison and Extensions

Variance-reduced SGHMC variants outperform traditional stochastic gradient MCMC methods (SGHMC, SGLD, VR-SGLD) across a wide range of regimes in both theory and practice.

  • For strongly convex log-posteriors, VR-SGHMC achieves a mixed regime complexity strictly better than standard HMC or SGHMC except in rare limits (i.e., extremely large NN) (Zou et al., 2018).
  • Higher-order symmetric splitting can further accelerate mixing and reduce discretization error to O(h4)O(h^4) (Li et al., 2018).
  • For general (non-strongly convex) log-concave targets, extensions use quadratic regularization and achieve comparable bounds modulo dd-dependent terms (Zou et al., 2018).

Unbiased (SVRG/SAGA) and biased (SARAH/SARGE) VR-SGHMC schemes entail a trade-off between asymptotic bias and mean-square error; unbiased methods excel in high-precision regimes, while biased estimators are more attractive when moderate precision or very large data is paramount (Hu et al., 2021).

References

  • "Stochastic Variance-Reduced Hamilton Monte Carlo Methods" (Zou et al., 2018)
  • "Stochastic Gradient Hamiltonian Monte Carlo with Variance Reduction for Bayesian Inference" (Li et al., 2018)
  • "A Convergence Analysis for A Class of Practical Variance-Reduction Stochastic Gradient MCMC" (Chen et al., 2017)
  • "A New Framework for Variance-Reduced Hamiltonian Monte Carlo" (Hu et al., 2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Variance-Reduced SGHMC.