Papers
Topics
Authors
Recent
Search
2000 character limit reached

Critically-Damped Langevin Diffusion

Updated 17 January 2026
  • Critically-Damped Langevin Diffusion is a stochastic differential framework that augments traditional diffusion with momentum variables to achieve the fastest non-oscillatory convergence.
  • It improves theoretical mixing rates and sampling efficiency in score-based generative modeling, with empirical demonstrations on image synthesis and privacy metrics.
  • Employing methods like symmetric splitting and Euler–Maruyama, CLD optimizes discretization error while ensuring state-of-the-art generative performance.

Critically-Damped Langevin Diffusion (CLD) is a stochastic differential equation (SDE) framework used within score-based generative modeling and diffusion processes. By introducing auxiliary velocity (or momentum) variables and tuning frictional parameters to a critical value, CLD achieves the fastest non-oscillatory convergence to equilibrium in linear settings. This approach augments standard diffusion models and enables efficient data sampling, improved theoretical mixing rates, and new privacy properties.

1. Mathematical Formulation

CLD augments each data sample xtRdx_t \in \mathbb{R}^d with an auxiliary velocity vtRdv_t \in \mathbb{R}^d, creating a phase space ut=(xt,vt)R2du_t = (x_t, v_t) \in \mathbb{R}^{2d}. The standard forward (noising) SDE is

{dxt=M1vtβdt dvt=xt(12xt2)βdtΓM1vtβdt+2ΓβdWt\begin{cases} d x_t = M^{-1} v_t\,\beta\,dt \ d v_t = -\nabla_{x_t}\left(-\frac12\|x_t\|^2\right)\beta\,dt - \Gamma M^{-1}v_t\,\beta\,dt + \sqrt{2\Gamma\beta}\,dW_t \end{cases}

or in vector-matrix form,

d(xt vt)=[(M1vt xt)β+(0 ΓM1vt)β]dt+(0 2Γβ)dWt.d \begin{pmatrix} x_t \ v_t \end{pmatrix} = \left[\begin{pmatrix} M^{-1} v_t \ -x_t \end{pmatrix} \beta + \begin{pmatrix} 0 \ -\Gamma M^{-1} v_t \end{pmatrix} \beta \right] dt + \begin{pmatrix} 0 \ \sqrt{2\Gamma\beta} \end{pmatrix} dW_t.

Here, MM is a positive mass scalar, Γ\Gamma the friction parameter, and β\beta a time scaling constant. The “critical damping” regime corresponds to Γ2=4M\Gamma^2 = 4M (Dockhorn et al., 2021), yielding the fastest return to equilibrium without oscillations.

The corresponding reverse-time SDE (the key for generative modeling) is

{dxt=vtdt dvt=[xt+2vt+4vlogqTt(xt,vt)]dt+2dBt\begin{cases} d x_t = -v_t\,dt \ d v_t = [x_t + 2 v_t + 4 \nabla_v \log q_{T-t}(x_t, v_t)]\,dt + 2\,dB_t \end{cases}

with the learned “velocity-score” vlogqt\nabla_v \log q_t replaced by a neural estimate sts_t on time intervals [kh,(k+1)h][kh, (k+1)h] (Chen et al., 2022).

2. Theoretical Properties and Mixing Rates

Critical damping is derived from the characteristic polynomial of the deterministic part: λ2+Γλ+1=0\lambda^2 + \Gamma \lambda + 1 = 0. Setting Γ=2\Gamma = 2 for M=1M = 1 gives coincident eigenvalues at 1-1, guaranteeing the fastest non-oscillatory convergence (Du et al., 2022). In the linear SDE, this yields a unique stationary law N(0,I2d)\mathcal{N}(0, I_{2d}) and, by Fokker–Planck analysis, maximizes the spectral gap among all choices of Γ\Gamma for fixed stiffness (Sterling et al., 17 Sep 2025).

For strongly convex quadratic targets, CLD guarantees exponential convergence of the joint (xt,vt)(x_t, v_t) law to equilibrium at the optimal rate, determined by the critically damped linear drift. In the generalized CLD with position noise regularization (elliptic extension),

{dXt=Vtdt+εdBt dVt=γVtdtU(Xt)dt+σdWt\begin{cases} dX_t = V_t\,dt + \varepsilon dB_t\ dV_t = -\gamma V_t\,dt - \nabla U(X_t)\,dt + \sigma dW_t \end{cases}

one can optimize the additional noise ε\varepsilon to balance mixing speed and discretization error (Strasman et al., 4 Nov 2025).

3. Score-Based Generative Modeling and Training Objectives

In CLD-based score-based generative models (SGMs), only the velocity variable is directly noised. The associated conditional density pt(vtxt)p_t(v_t|x_t) remains closer to Gaussian throughout the process, making the required score-matching task (learning vlogpt(vx)\nabla_{v}\log p_t(v|x)) simpler than in purely overdamped cases (Dockhorn et al., 2021). The continuous-time score-matching loss is

L(θ)=Et,utpt[λ(t)sθ(ut,t)utlogpt(ut)2],\mathcal{L}(\theta) = \mathbb{E}_{t, u_t \sim p_t} \left[ \lambda(t) \| s_\theta(u_t, t) - \nabla_{u_t} \log p_t(u_t) \|^2 \right],

which reduces to a ‘hybrid’ denoising-style score-matching objective over the velocity conditional. The theoretical framework generalizes standard variational perspectives by allowing parameterized forward SDEs and using block-constant Riemannian + symplectic drift structures (Du et al., 2022).

4. Sampling Algorithms and Discretization

Sampling from the CLD-based reverse process can be discretized using either Euler–Maruyama (Du et al., 2022, Sterling et al., 17 Sep 2025) or more advanced symmetric splitting methods. The Symmetric Splitting CLD Sampler (SSCS) decomposes the reverse SDE generator into an affine “OU” step (solved exactly) and a score-driven ODE in the vv coordinates, iterating via Strang splitting (Dockhorn et al., 2021). Pseudocode for one SSCS time-step: python Compared to simple Euler–Maruyama, the SSCS approach yields higher sampling accuracy and requires fewer function evaluations to achieve similar generative quality (Dockhorn et al., 2021).

In the context of minimal data assumptions, under an L2L^2-accurate learned score, optimal convergence to the target law (in total variation or W2W_2 Wasserstein) is guaranteed (Chen et al., 2022, Strasman et al., 4 Nov 2025). The discretization error and choice of step size h1/Lh \lesssim 1/L are crucial; theoretical analysis bounds the number of required steps as N=Θ~(L2d/ε2)N = \widetilde{\Theta}(L^2 d / \varepsilon^2) to achieve error ε\varepsilon (Chen et al., 2022).

5. Comparison to Overdamped and Higher-Order Langevin Diffusions

CLD extends the classical overdamped Langevin setting (dx=U(x)dt+2dWtdx = -\nabla U(x)\,dt + \sqrt{2}\,dW_t) to include explicit momentum coupling. In strongly convex and LL-smooth settings, the mixing time for CLD scales as O(κlog(1/ε))O(\sqrt{\kappa} \log (1/\varepsilon)) in the condition number κ\kappa, compared to O(κlog(1/ε))O(\kappa \log (1/\varepsilon)) for the overdamped case (Sterling et al., 26 Jun 2025). However, in the SGM setting with learned scores, CLD does not improve the dependence on ambient dimension dd relative to classical DDPM, due to the velocity-score estimation error accumulating linearly in dd (Chen et al., 2022).

Recent work generalizes CLD to third-order (TOLD++) and higher-order Langevin diffusions (HOLD++), where critical damping of the system matrix maximizes contraction rate and leads to further reductions in mixing times and improved FID on real and synthetic datasets (Sterling et al., 2024, Sterling et al., 26 Jun 2025).

6. Applications and Empirical Performance

CLD-based SGMs have achieved state-of-the-art generative quality on large-scale datasets. Examples:

  • On CIFAR-10, with approximately 100M-parameter U-Net and 2000 function evaluations, CLD-SGM achieves FID 2.23\approx 2.23 (reverse SDE) and 2.25\approx 2.25 (probability flow ODE) (Dockhorn et al., 2021).
  • On CelebA-HQ-256, CLD combined with SSCS produces high-fidelity 256×\times256 samples with a few hundred function evaluations.
  • Empirical results consistently show that CLD-based methods can outperform classic SDE samplers under fixed network and compute budgets (Dockhorn et al., 2021, Sterling et al., 26 Jun 2025, Du et al., 2022).
  • In privacy-focused applications such as robustness against membership inference attacks, CLD increases the marginal entropy and trace of the covariance of the noised data, leading to strictly lower Renyi divergences and improved privacy metrics (e.g., AUROC reduction from 0.9\approx 0.9 to 0.6\approx 0.6 in toy datasets) (Sterling et al., 17 Sep 2025).

7. Limitations, Tuning, and Open Directions

Although CLD achieves optimal (fastest possible non-oscillatory) mixing in quadratic settings, there is no reduction in dimension dependence for general data distributions with minimal assumptions; the asymptotic step-size requirement remains h1/dh \lesssim 1/d (Chen et al., 2022). Introducing small position noise ε>0\varepsilon > 0 regularizes the process, enhancing ellipticity, smoothness, and empirical performance, but unavoidably increases the discretization constant, requiring delicate balancing in practice (Strasman et al., 4 Nov 2025). Practical tuning involves:

  • Selecting γ\gamma to match the largest curvature direction (critical damping)
  • Adjusting step size hh according to spectral properties for stability
  • Choosing ε\varepsilon (when using the regularized CLD) to trade off mixing, discretization, and approximation errors

Continued research investigates further generalizations to higher-order Langevin dynamics and critical damping strategies, offering promising avenues for improved efficiency and robustness in generative modeling (Sterling et al., 26 Jun 2025, Sterling et al., 2024).


Key References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Critically-Damped Langevin Diffusion (CLD).