Affine Invariant Langevin Dynamics (ALDI)

Updated 8 January 2026

Affine Invariant Langevin Dynamics (ALDI) is a framework using interacting particle SDEs with affine invariance to sample high-dimensional, anisotropic distributions effectively.
It employs ensemble covariance preconditioning to adapt both drift and diffusion, supporting gradient-based and gradient-free approaches in Bayesian inference and optimal design.
The method guarantees ergodicity with robust mixing properties, making it beneficial for inverse problems, rare-event estimation, and efficient experimental design.

Affine Invariant Langevin Dynamics (ALDI) is a class of interacting particle stochastic differential equations (SDEs) designed for sampling from high-dimensional, often anisotropic, target distributions with robust mixing properties and explicit invariance under affine transformations. Incorporating ensemble-based empirical covariance preconditioning, ALDI adapts both drift and diffusion to the evolving geometry of the sampled ensemble. ALDI admits gradient-based and gradient-free (derivative-free) formulations, making the methodology suitable for Bayesian inference, optimal experimental design, inverse problems, and rare-event estimation in settings where gradients of the log-target are unavailable or computationally prohibitive. The theory guarantees ergodicity and invariance under affine mappings provided the ensemble size exceeds the ambient dimension by at least one.

1. Mathematical Formulation

Let $\pi(\theta) \propto \exp(-U(\theta))$ define the target posterior, where $U$ is a (possibly non-convex, non-smooth) potential. Standard overdamped Langevin dynamics employ

$d\theta_t = -\nabla U(\theta_t)\, dt + \sqrt{2}\, dW_t,$

which exhibits slow mixing for ill-conditioned or anisotropic targets and requires explicit gradients.

ALDI generalizes this mechanism by evolving an ensemble $\{\theta^{(j)}_t \in \mathbb{R}^d\}_{j=1}^J$ via the empirically estimated ensemble mean and covariance,

$\overline\theta_t = \tfrac{1}{J} \sum_{j=1}^J \theta^{(j)}_t,\quad C_t = \tfrac{1}{J} \sum_{j=1}^J (\theta^{(j)}_t - \overline\theta_t)(\theta^{(j)}_t - \overline\theta_t)^T.$

The canonical ALDI SDE reads

$d\theta_t^{(j)} = -C_t\, \nabla U(\theta_t^{(j)})\, dt \;+\; \frac{d+1}{J} (\theta_t^{(j)} - \overline\theta_t)\,dt \;+\;\sqrt{2}\,C_t^{1/2}\, dW_t^{(j)},$

for independent $d$ -dimensional Brownian motions $W_t^{(j)}$ . The $(d+1)/J$ “repulsion drift” restores affine invariance and prevents ensemble collapse.

Discrete Euler–Maruyama time-stepping yields

$\theta_{n+1}^{(j)} = \theta_n^{(j)} - \epsilon\, C_n\, \nabla U(\theta_n^{(j)}) + \epsilon \frac{d+1}{J} (\theta_n^{(j)} - \overline\theta_n) + \sqrt{2\epsilon}\, C_n^{1/2} \xi_n^{(j)},$

with $\xi_n^{(j)} \sim N(0,I_d)$ . The non-symmetric square root $C_n^{1/2}$ may be computed directly from the ensemble deviations, avoiding costly matrix factorizations (Garbuno-Inigo et al., 2019, Gruhlke et al., 17 Apr 2025).

Gradient-free ALDI replaces $C_t\nabla U$ with an ensemble Kalman-style cross-covariance update,

$C_{\theta,G} \Gamma^{-1} (y - G(\theta)),$

allowing sampling when only the forward model $G(\theta)$ is available (Gruhlke et al., 17 Apr 2025, Garbuno-Inigo et al., 2019, Chakraborty et al., 31 Dec 2025).

2. Affine Invariance Properties

A defining feature of ALDI is invariance under all invertible affine transformations $\theta \mapsto A\theta + b$ , $A \in \mathbb{R}^{d \times d}$ . The empirical covariance and drift/diffusion terms transform so that the SDE retains its form in the new basis:

$C_\theta \mapsto A C_\theta A^T,\quad \nabla_\theta U(\theta) \mapsto A^T \nabla_\vartheta U_A(\vartheta),$

and both drift and diffusion are automatically re-scaled. The repulsion drift $(d+1)/J(\theta - \overline{\theta})$ is manifestly affine-covariant.

As a consequence, ALDI achieves mixing rates independent of linear re-scaling, rotations, or anisotropy in $\pi(\theta)$ , obviating manual preconditioner tuning required by standard Langevin or MALA samplers (Gruhlke et al., 17 Apr 2025, Garbuno-Inigo et al., 2019, Beh et al., 25 Jun 2025).

3. Theoretical Properties and Consistency

Provided the empirical covariance is initially strictly positive-definite and $J > d+1$ , ALDI admits unique global strong solutions and its ensemble law converges in total variation to the product target measure $\pi^{\otimes J}$ (Garbuno-Inigo et al., 2019, Beh et al., 25 Jun 2025, Chakraborty et al., 31 Dec 2025). Key results include:

Non-degeneracy: The ensemble covariance remains positive-definite for all $t$ , enforced by the finite- $J$ correction drift.
Ergodicity: Under quadratic growth and smoothness conditions on $U$ , the process is irreducible and the stationary distribution is guaranteed.
Affine invariance: Proven for both continuous and discretized SDE systems.
Small noise consistency: For rare-event Bayesian posteriors, the ALDI-trajectory samples converge to the prior restricted to the failure set as the effective observation noise $R \to 0$ , independently of the smoothing parameter δ (Chakraborty et al., 31 Dec 2025).

In the mean-field limit $J \to \infty$ , the ALDI system induces a nonlinear gradient flow for KL-divergence in Wasserstein metric on the empirical measure space:

$\partial_t\pi_t = \nabla\cdot\left(\pi_t\, C(\pi_t)\, \nabla \frac{\delta KL(\pi_t \mid \pi_*)}{\delta\pi_t}\right),$

where the covariance operator $C(\pi_t)$ is a function of the current empirical law (Garbuno-Inigo et al., 2019).

4. Algorithmic Implementation

Implementation is based on ensemble propagation via the Euler–Maruyama update, as detailed in the following generic pseudocode:

Initialize θ_0^{(j)} ∼ prior, j=1,…,J
For n = 0 to N - 1:
    Compute ensemble mean \overline θ_n and covariance C_n
    For each j = 1,…,J:
        Draw ξ_n^{(j)} ∼ N(0,I_d)
        Update
        θ_{n+1}^{(j)} = θ_n^{(j)}
          - ε C_n ∇U(θ_n^{(j)})
          + ε (d+1)/J (θ_n^{(j)} - \overline θ_n)
          + sqrt(2ε) C_n^{1/2} ξ_n^{(j)}
End
Output θ_N^{(j)} as approximate samples from π

For derivative-free applications, substitute the ensemble Kalman cross-covariance formula for

C_n\,\nabla U

Computational costs scale as $O(J d^2)$ per ensemble step for covariance estimation, with $O(d^3)$ for square-root computation, assuming no low-rank or diagonal approximation. In practice, $J \sim 2$ –$5 d$ is sufficient (Gruhlke et al., 17 Apr 2025, Eigel et al., 2022). Stability requires $\epsilon \lesssim 1/\lambda_{\max}(C_n\, \mathrm{Hess}\,U)$ , but ALDI achieves greater numerical stability than isotropic schemes due to adaptive preconditioning.

Adaptive ensemble enrichment strategies (e.g., LIDL) successively enlarge the particle set using random kicks, diffusion-only propagation, or transport maps, reducing costly forward evaluations for Bayesian inverse problems and achieving consistency in linear-Gaussian settings (Eigel et al., 2022).

5. Applications in Bayesian Inverse Problems and Rare Event Estimation

ALDI is extensively applied in Bayesian inference for complex forward models (e.g., PDE-constrained inverse problems) and in rare-event probability estimation, where the optimal importance sampling density is often non-differentiable (Beh et al., 25 Jun 2025, Chakraborty et al., 31 Dec 2025). For demonstration:

Bayesian Experimental Design (BOED): ALDI provides efficient, derivative-free posterior sampling for utility estimation, enabling scalable information-driven design in high dimensions (Gruhlke et al., 17 Apr 2025).
Rare-event sampling: ALDI is used to sample smoothed zero-variance densities derived by logistic or ramp approximations of limit-state functions, yielding proposal densities for importance sampling and error bounds for rare-event probabilities. Trade-offs in smoothing and discretization bias are rigorously characterized, with ALDI-based samplers outperforming MH or HMC subset simulation frameworks in high-dimensional settings (Beh et al., 25 Jun 2025, Chakraborty et al., 31 Dec 2025).
PDE-based Bayesian inverse problems: Both gradient-based and gradient-free ALDI samplers produce comparable bias and spread, with the gradient-free variant requiring substantially fewer gradient calls for large parameter dimensions (Garbuno-Inigo et al., 2019).

6. Numerical Performance and Extensions

Empirical studies confirm ALDI’s robust mixing and accuracy in linear Gaussian, nonlinear, and multimodal posteriors:

Test Case	Dimensionality	Ensemble Size	Key Finding
Linear Gaussian	$d\leq 256$	$J=50$ –$256$	Posterior recovered; rapid mixing
PDE-based Darcy flow	$d=50$ , $101$	$J=25$ –$200$	Gradient-free ALDI matches bias of gradient-based
Rare-event hyperplane	$d=100$	$M=50$	nRMSE $\approx 0.18$ , outperforms MH/HMC-SuS
Atmospheric blocking ODE	nonlinear	$J=10^2$ – $10^3$	Gaussian mixture IS achieves variance reduction

Increasing ensemble size improves accuracy and empirical consistency, and key metrics such as Sinkhorn divergence decay with $J^{-1/2}$ . Homotopy extensions and enrichment schemes address multimodality and reduce forward model costs by up to $50\%$ compared to full ALDI runs (Eigel et al., 2022).

7. Limitations and Recommended Use Cases

ALDI’s primary limitations are computational cost for covariance square-root updates in very high dimensions and imperfect ergodicity when $J \leq d+1$ or in gradient-free implementations with complex forward models. Discretization bias must be controlled with small step sizes; mixture-based importance sampling can overfit with small $J$ , and gradient-free ergodicity is not yet theoretically established (Chakraborty et al., 31 Dec 2025). ALDI is best suited for moderate-dimensional problems with strong anisotropy, expensive or unavailable gradients, and as a geometry-probing phase for proposal density construction.

Recommended applications include Bayesian experimental design, rare-event sampling near failure manifolds, PDE inference with black-box forward models, and scenarios requiring adaptive proposal distributions conditioned on informative subspaces. In all cases, ALDI’s affine invariance ensures robust performance under challenging geometric conditions.