MALA: Metropolis-adjusted Langevin Algorithm

Updated 16 February 2026

MALA is a Markov chain Monte Carlo method that integrates gradient information into proposals, ensuring sampling efficiency in high dimensions.
It employs an Euler–Maruyama discretization with a Metropolis–Hastings step to correct discretization bias and secure the invariant target distribution.
Variants like proximal MALA, MASLA, and geometric adaptations enhance its applicability to non-smooth, high-dimensional, and manifold-constrained sampling problems.

The Metropolis-adjusted Langevin Algorithm (MALA) is a Markov chain Monte Carlo (MCMC) method that constructs reversible Markov chains targeting a chosen distribution on high-dimensional Euclidean space or manifolds. MALA integrates information from the gradient of the log-target density into proposal moves and corrects for discretization bias by a Metropolis–Hastings accept-reject mechanism. Variants based on convex analysis, subdifferential calculus, local or global adaptation, and geometric preconditioning have broadened the scope of MALA to non-smooth, high-dimensional, or non-Euclidean sampling regimes.

1. Foundations and Algorithmic Structure

MALA is derived from the Euler–Maruyama discretization of the overdamped Langevin diffusion targeting a distribution $\pi(x)\propto\exp(-U(x))$ : $dX_t = -\nabla U(X_t)\,dt + \sqrt{2}\,dW_t,$ where $W_t$ denotes Brownian motion. A single MALA proposal, with step size $h>0$ , is given by

$X' = X - \frac{h}{2}\nabla U(X) + \sqrt{h}Z,\qquad Z\sim\mathcal{N}(0, I_d).$

The proposed $X'$ is accepted with probability

$\alpha(X, X') = 1 \wedge \frac{\pi(X')q(X' \to X)}{\pi(X)q(X \to X')},$

where $q(x\to y)$ is the Gaussian proposal density centered at $x-(h/2)\nabla U(x)$ , with covariance $hI_d$ . The Metropolis–Hastings correction ensures that $dX_t = -\nabla U(X_t)\,dt + \sqrt{2}\,dW_t,$ 0 is the unique invariant distribution.

The optimal acceptance rate for high-dimensional product targets or suitable non-product targets is approximately $dX_t = -\nabla U(X_t)\,dt + \sqrt{2}\,dW_t,$ 1, and optimal step-size scaling is $dX_t = -\nabla U(X_t)\,dt + \sqrt{2}\,dW_t,$ 2 in stationarity for a $dX_t = -\nabla U(X_t)\,dt + \sqrt{2}\,dW_t,$ 3-dimensional system; $dX_t = -\nabla U(X_t)\,dt + \sqrt{2}\,dW_t,$ 4 out of stationarity (Pillai et al., 2011, Kuntz et al., 2016, Pillai, 2022).

2. Theoretical Performance and High-dimensional Scaling

The efficiency and mixing time of MALA have been rigorously analyzed in both product and non-product settings. For targets with i.i.d. structure and sufficiently smooth potentials, running MALA in stationarity with step-size $dX_t = -\nabla U(X_t)\,dt + \sqrt{2}\,dW_t,$ 5 yields an exploration time of order $dX_t = -\nabla U(X_t)\,dt + \sqrt{2}\,dW_t,$ 6 for the invariant measure (Pillai et al., 2011, Pillai, 2022).

If the target is strongly log-concave with $dX_t = -\nabla U(X_t)\,dt + \sqrt{2}\,dW_t,$ 7, non-asymptotic mixing time analysis yields an optimal rate of $dX_t = -\nabla U(X_t)\,dt + \sqrt{2}\,dW_t,$ 8 as the dimension $dX_t = -\nabla U(X_t)\,dt + \sqrt{2}\,dW_t,$ 9, up to polylogarithmic factors, for any $W_t$ 0, total variation, KL, or Wasserstein metric (Chewi et al., 2020). This bridges the gap between previous diffusion-limit intuition ( $W_t$ 1) and prior non-asymptotic $W_t$ 2 bounds.

For Bayesian posteriors approximating multivariate Gaussians, a refined $W_t$ 3-conductance profile method demonstrates that, after warmup, MALA achieves optimal dimension and condition number dependence of order $W_t$ 4 (where $W_t$ 5 is the condition number) for the mixing time (Tang et al., 2022).

When the log-target admits only weak higher-order regularity (incoherence-type assumptions, not global operator norm bounds), sublinear-in-dimension mixing for weakly log-concave and certain nonconvex targets is achievable (Mangoubi et al., 2019).

3. Algorithmic Variants and Extensions

Several sophisticated extensions of MALA handle challenges arising from non-smooth potentials, sparsity, geometry, and adaptivity.

Proximal MALA: Suitable for log-concave, possibly non-smooth $W_t$ 6. Replaces $W_t$ 7 in the proposal by the proximity operator $W_t$ 8. Accept/reject is performed using Gaussian proposals shifted to $W_t$ 9 (Pereyra, 2013, Pillai, 2022). Proximal MALA achieves robust geometric ergodicity and matches the optimal scaling asymptotics of gradient MALA.
Metropolis-adjusted Subdifferential Langevin Algorithm (MASLA): For locally Lipschitz or nonconvex, non-differentiable $h>0$ 0. Replaces gradients by elements of a measurable conservative field (set-valued extension of subgradient), ensuring stationarity and reversibility almost everywhere. Empirical results confirm geometric ergodicity and ability to sample from densities inaccessible to classical or proximal MCMC (Ning, 9 Jul 2025).
Shrinkage–Thresholding MALA (STMALA): Designed for sparse Bayesian variable selection, combines MALA with a coordinate-wise soft-thresholding operator to effect variable-inclusion moves. Achieves V-geometric ergodicity and superior exploration compared to both basic MALA and reversible-jump MCMC in sparse high-dimensional regimes (Schreck et al., 2013).
Fisher-adaptive and Geometric Preconditioning: Fisher-adaptive MALA (FaMALA) and related approaches employ the inverse Fisher information matrix as an optimal preconditioner, learning this matrix adaptively using streaming estimates from the history of gradients. This preconditioning maximizes the expected squared jump distance and leads to dimension-robust efficiency superior to standard adaptive MALA and pCN (Titsias, 2023, Wang et al., 12 Mar 2025). Geometric and manifold variants use local metric tensors or stochastic development for sampling in non-Euclidean geometries (Xifara et al., 2013, Mamajiwala et al., 2022).
Backpropagation-Free MALA: Uses forward-mode automatic differentiation and randomly sampled search directions to obviate the need for reverse-mode gradient computation in machine learning settings, while maintaining detailed balance via modified proposals and acceptance rules (Cobb et al., 23 May 2025).
Locally Adaptive Step Size (autoMALA): Adjusts the proposal step size at each iteration based on local characteristics of the target, preserving invariance by augmenting state with random thresholds and enforcing reversibility through acceptance interval tests (Biron-Lattes et al., 2023).

4. Non-Asymptotic Mixing, Conductance, and Robustness

Recent analyses have established quantitative non-asymptotic convergence guarantees for MALA under relaxed smoothness, log-concavity, and isoperimetric assumptions. In particular, mixing time can be bounded as: $h>0$ 1 where $h>0$ 2 is the operator-norm Hessian bound, $h>0$ 3 its trace, and $h>0$ 4 isoperimetric constant of the target (Chen et al., 2023). The dependence on $h>0$ 5 can yield improved dimension scaling in models with low-rank or sparse curvature.

For targets that are perturbations of high-dimensional Gaussians, sharp Wasserstein contraction rates for (semi-implicit Euler) MALA are shown to be uniform in $h>0$ 6 under mild smoothness and convexity; optimal diffusion-limit rates are recovered as $h>0$ 7 (Eberle, 2012).

When applied to non-smooth or composite log-densities, subdifferential MASLA and proximal MALA retain exponential convergence under broader conditions, while empirical studies confirm practical superiority to standard MALA, unadjusted Langevin, or non-gradient MCMC in the presence of non-differentiability (Pereyra, 2013, Ning, 9 Jul 2025).

5. Geometric, Manifold, and Preconditioned Approaches

Position-dependent and Riemannian MALA: Allows local matrix-valued preconditioning for anisotropic and curved targets. Requires careful incorporation of divergence (or connection/Christoffel) terms in the drift to guarantee stationarity under Lebesgue measure (Xifara et al., 2013). In many statistical models, using the Fisher information metric or Hessian of the negative log-likelihood as the local metric delivers robust adaptation to geometry and mixing times substantially superior to identity-preconditioned MALA.
Geometric Adaptive Langevin Dynamics (GALA): Formulates the Langevin equation on a Riemannian manifold and discretizes with the correct drift and diffusion, as prescribed by stochastic development and Itô calculus. Empirical studies show order-of-magnitude gains in mixing, acceptance, and statistical efficiency over Euclidean MALA and existing Riemannian variants, especially in ill-conditioned or high-dimensional regimes (Mamajiwala et al., 2022).
Fisher-adaptive MALA: Online estimation of the Fisher information allows scaling proposals optimally with respect to the statistical geometry of the posterior, substantially improving effective sample size and autocorrelation, particularly in inverse problems and ill-conditioned posteriors (Titsias, 2023, Wang et al., 12 Mar 2025).

6. Applications and Practical Considerations

MALA and its variants are widely used in Bayesian statistical inference, computational imaging, hierarchical and high-dimensional models, nonparametric regression, machine learning, and inverse problems. Proximal and subdifferential MALA extend applicability to convex and nonconvex composite posteriors (e.g., Bayesian lasso, total-variation image deconvolution, low-rank matrix factorization, neural network weight posteriors with ReLU activations) (Pereyra, 2013, Ning, 9 Jul 2025).

Efficient computation of the proximity operator or Fisher matrix is crucial for practical deployment. For smooth potentials, gradient-based MALA is computationally simpler, while for non-smooth or structurally sparse models, proximal or thresholded variants offer orders-of-magnitude better mixing and credible region estimation (Pillai, 2022, Schreck et al., 2013).

In high dimensions, careful tuning of the step size to target the optimal acceptance probability, preconditioning (either geometric or Fisher-adaptive), and possible use of locally adaptive step size rules (autoMALA) are empirically validated strategies for maximizing sampling efficiency (Biron-Lattes et al., 2023, Titsias, 2023, Wang et al., 12 Mar 2025).

7. Limitations, Open Problems, and Future Directions

While the theory for MALA’s efficiency and optimal scaling is mature in the context of smooth, strongly log-concave, or high-dimensional product targets, less is known for general non-log-concave, multimodal, or manifold-constrained sampling (Chewi et al., 2020, Mangoubi et al., 2019, Chang et al., 2023). Recent work extends MALA to locally Lipschitz and non-differentiable settings, but full non-asymptotic mixing time bounds and robustness guarantees are yet incomplete for these regimes (Ning, 9 Jul 2025).

Further research directions include: rigorous mixing time analysis for MALA with hard constraints and in the presence of support restrictions (Chang et al., 2023); adaptive and online strategies for learning geometric and Fisher preconditioners at scale; principled extension of local adaptation (autoMALA) to ensure irreducibility across all regimes; integration of backpropagation-free techniques to enable scalable Bayesian neural network inference (Cobb et al., 23 May 2025); and hybridization with Hamiltonian Monte Carlo for high-curvature or multimodal posteriors (Bieringer et al., 2023).

References:

(Pillai et al., 2011, Pillai, 2022, Chewi et al., 2020, Pereyra, 2013, Xifara et al., 2013, Schreck et al., 2013, Titsias, 2023, Wang et al., 12 Mar 2025, Biron-Lattes et al., 2023, Ning, 9 Jul 2025, Cobb et al., 23 May 2025, Eberle, 2012, Mangoubi et al., 2019, Mamajiwala et al., 2022, Chen et al., 2023, Tang et al., 2022, Kuntz et al., 2016, Bieringer et al., 2023, Chang et al., 2023)