Critically-Damped Langevin Diffusion

Updated 17 January 2026

Critically-Damped Langevin Diffusion is a stochastic differential framework that augments traditional diffusion with momentum variables to achieve the fastest non-oscillatory convergence.
It improves theoretical mixing rates and sampling efficiency in score-based generative modeling, with empirical demonstrations on image synthesis and privacy metrics.
Employing methods like symmetric splitting and Euler–Maruyama, CLD optimizes discretization error while ensuring state-of-the-art generative performance.

Critically-Damped Langevin Diffusion (CLD) is a stochastic differential equation (SDE) framework used within score-based generative modeling and diffusion processes. By introducing auxiliary velocity (or momentum) variables and tuning frictional parameters to a critical value, CLD achieves the fastest non-oscillatory convergence to equilibrium in linear settings. This approach augments standard diffusion models and enables efficient data sampling, improved theoretical mixing rates, and new privacy properties.

1. Mathematical Formulation

CLD augments each data sample $x_t \in \mathbb{R}^d$ with an auxiliary velocity $v_t \in \mathbb{R}^d$ , creating a phase space $u_t = (x_t, v_t) \in \mathbb{R}^{2d}$ . The standard forward (noising) SDE is

$\begin{cases} d x_t = M^{-1} v_t\,\beta\,dt \ d v_t = -\nabla_{x_t}\left(-\frac12\|x_t\|^2\right)\beta\,dt - \Gamma M^{-1}v_t\,\beta\,dt + \sqrt{2\Gamma\beta}\,dW_t \end{cases}$

or in vector-matrix form,

$d \begin{pmatrix} x_t \ v_t \end{pmatrix} = \left[\begin{pmatrix} M^{-1} v_t \ -x_t \end{pmatrix} \beta + \begin{pmatrix} 0 \ -\Gamma M^{-1} v_t \end{pmatrix} \beta \right] dt + \begin{pmatrix} 0 \ \sqrt{2\Gamma\beta} \end{pmatrix} dW_t.$

Here, $M$ is a positive mass scalar, $\Gamma$ the friction parameter, and $\beta$ a time scaling constant. The “critical damping” regime corresponds to $\Gamma^2 = 4M$ (Dockhorn et al., 2021), yielding the fastest return to equilibrium without oscillations.

The corresponding reverse-time SDE (the key for generative modeling) is

$\begin{cases} d x_t = -v_t\,dt \ d v_t = [x_t + 2 v_t + 4 \nabla_v \log q_{T-t}(x_t, v_t)]\,dt + 2\,dB_t \end{cases}$

with the learned “velocity-score” $\nabla_v \log q_t$ replaced by a neural estimate $s_t$ on time intervals $[kh, (k+1)h]$ (Chen et al., 2022).

2. Theoretical Properties and Mixing Rates

Critical damping is derived from the characteristic polynomial of the deterministic part: $\lambda^2 + \Gamma \lambda + 1 = 0$ . Setting $\Gamma = 2$ for $M = 1$ gives coincident eigenvalues at $-1$ , guaranteeing the fastest non-oscillatory convergence (Du et al., 2022). In the linear SDE, this yields a unique stationary law $\mathcal{N}(0, I_{2d})$ and, by Fokker–Planck analysis, maximizes the spectral gap among all choices of $\Gamma$ for fixed stiffness (Sterling et al., 17 Sep 2025).

For strongly convex quadratic targets, CLD guarantees exponential convergence of the joint $(x_t, v_t)$ law to equilibrium at the optimal rate, determined by the critically damped linear drift. In the generalized CLD with position noise regularization (elliptic extension),

$\begin{cases} dX_t = V_t\,dt + \varepsilon dB_t\ dV_t = -\gamma V_t\,dt - \nabla U(X_t)\,dt + \sigma dW_t \end{cases}$

one can optimize the additional noise $\varepsilon$ to balance mixing speed and discretization error (Strasman et al., 4 Nov 2025).

3. Score-Based Generative Modeling and Training Objectives

In CLD-based score-based generative models (SGMs), only the velocity variable is directly noised. The associated conditional density $p_t(v_t|x_t)$ remains closer to Gaussian throughout the process, making the required score-matching task (learning $\nabla_{v}\log p_t(v|x)$ ) simpler than in purely overdamped cases (Dockhorn et al., 2021). The continuous-time score-matching loss is

$\mathcal{L}(\theta) = \mathbb{E}_{t, u_t \sim p_t} \left[ \lambda(t) \| s_\theta(u_t, t) - \nabla_{u_t} \log p_t(u_t) \|^2 \right],$

which reduces to a ‘hybrid’ denoising-style score-matching objective over the velocity conditional. The theoretical framework generalizes standard variational perspectives by allowing parameterized forward SDEs and using block-constant Riemannian + symplectic drift structures (Du et al., 2022).

4. Sampling Algorithms and Discretization

Sampling from the CLD-based reverse process can be discretized using either Euler–Maruyama (Du et al., 2022, Sterling et al., 17 Sep 2025) or more advanced symmetric splitting methods. The Symmetric Splitting CLD Sampler (SSCS) decomposes the reverse SDE generator into an affine “OU” step (solved exactly) and a score-driven ODE in the $v$ coordinates, iterating via Strang splitting (Dockhorn et al., 2021). Pseudocode for one SSCS time-step: python Compared to simple Euler–Maruyama, the SSCS approach yields higher sampling accuracy and requires fewer function evaluations to achieve similar generative quality (Dockhorn et al., 2021).

In the context of minimal data assumptions, under an $L^2$ -accurate learned score, optimal convergence to the target law (in total variation or $W_2$ Wasserstein) is guaranteed (Chen et al., 2022, Strasman et al., 4 Nov 2025). The discretization error and choice of step size $h \lesssim 1/L$ are crucial; theoretical analysis bounds the number of required steps as $N = \widetilde{\Theta}(L^2 d / \varepsilon^2)$ to achieve error $\varepsilon$ (Chen et al., 2022).

5. Comparison to Overdamped and Higher-Order Langevin Diffusions

CLD extends the classical overdamped Langevin setting ( $dx = -\nabla U(x)\,dt + \sqrt{2}\,dW_t$ ) to include explicit momentum coupling. In strongly convex and $L$ -smooth settings, the mixing time for CLD scales as $O(\sqrt{\kappa} \log (1/\varepsilon))$ in the condition number $\kappa$ , compared to $O(\kappa \log (1/\varepsilon))$ for the overdamped case (Sterling et al., 26 Jun 2025). However, in the SGM setting with learned scores, CLD does not improve the dependence on ambient dimension $d$ relative to classical DDPM, due to the velocity-score estimation error accumulating linearly in $d$ (Chen et al., 2022).

Recent work generalizes CLD to third-order (TOLD++) and higher-order Langevin diffusions (HOLD++), where critical damping of the system matrix maximizes contraction rate and leads to further reductions in mixing times and improved FID on real and synthetic datasets (Sterling et al., 2024, Sterling et al., 26 Jun 2025).

6. Applications and Empirical Performance

CLD-based SGMs have achieved state-of-the-art generative quality on large-scale datasets. Examples:

On CIFAR-10, with approximately 100M-parameter U-Net and 2000 function evaluations, CLD-SGM achieves FID $\approx 2.23$ (reverse SDE) and $\approx 2.25$ (probability flow ODE) (Dockhorn et al., 2021).
On CelebA-HQ-256, CLD combined with SSCS produces high-fidelity 256 $\times$ 256 samples with a few hundred function evaluations.
Empirical results consistently show that CLD-based methods can outperform classic SDE samplers under fixed network and compute budgets (Dockhorn et al., 2021, Sterling et al., 26 Jun 2025, Du et al., 2022).
In privacy-focused applications such as robustness against membership inference attacks, CLD increases the marginal entropy and trace of the covariance of the noised data, leading to strictly lower Renyi divergences and improved privacy metrics (e.g., AUROC reduction from $\approx 0.9$ to $\approx 0.6$ in toy datasets) (Sterling et al., 17 Sep 2025).

7. Limitations, Tuning, and Open Directions

Although CLD achieves optimal (fastest possible non-oscillatory) mixing in quadratic settings, there is no reduction in dimension dependence for general data distributions with minimal assumptions; the asymptotic step-size requirement remains $h \lesssim 1/d$ (Chen et al., 2022). Introducing small position noise $\varepsilon > 0$ regularizes the process, enhancing ellipticity, smoothness, and empirical performance, but unavoidably increases the discretization constant, requiring delicate balancing in practice (Strasman et al., 4 Nov 2025). Practical tuning involves:

Selecting $\gamma$ to match the largest curvature direction (critical damping)
Adjusting step size $h$ according to spectral properties for stability
Choosing $\varepsilon$ (when using the regularized CLD) to trade off mixing, discretization, and approximation errors

Continued research investigates further generalizations to higher-order Langevin dynamics and critical damping strategies, offering promising avenues for improved efficiency and robustness in generative modeling (Sterling et al., 26 Jun 2025, Sterling et al., 2024).

Key References:

"Score-Based Generative Modeling with Critically-Damped Langevin Diffusion" (Dockhorn et al., 2021)
"Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions" (Chen et al., 2022)
"Wasserstein Convergence of Critically Damped Langevin Diffusions" (Strasman et al., 4 Nov 2025)
"Critically-Damped Higher-Order Langevin Dynamics" (Sterling et al., 26 Jun 2025)
"Defending Diffusion Models Against Membership Inference Attacks via Higher-Order Langevin Dynamics" (Sterling et al., 17 Sep 2025)
"A Flexible Diffusion Model" (Du et al., 2022)
"Critically Damped Third-Order Langevin Dynamics" (Sterling et al., 2024)