Moderate Deviation Principle

Updated 18 December 2025

The Moderate Deviation Principle is a framework that characterizes intermediate-scale deviations in stochastic systems, linking Gaussian CLT fluctuations to exponential LDP events.
It employs weak convergence and variational representations to derive quadratic rate functions that capture the asymptotic behavior of moderate deviations.
MDP finds robust applications across finite and infinite-dimensional models, multiscale diffusions, and rare-event simulations, offering critical insights in both theory and practice.

The Moderate Deviation Principle (MDP) is a mathematical framework for characterizing the asymptotic probabilities of deviations of stochastic systems on scales intermediate between those governed by the central limit theorem (CLT) and large deviation principles (LDP). MDP theory occupies a central role in modern probability theory, stochastic analysis, and applied probability, providing precise Gaussian-type asymptotics for rare events that are more probable than exponential-scale large deviations but much less probable than typical fluctuations. The MDP has been rigorously developed for a wide array of finite- and infinite-dimensional systems, including SDEs, SPDEs, multiscale diffusions, stochastic partial differential equations, Markov processes, random fields, branching structures, and statistical functionals.

1. Foundational Framework and General Notion

Let $\{Z^\varepsilon\}$ be a family of random elements in a Polish space (e.g., $C([0,1];\mathbb{R}^n)$ ). The family satisfies a Moderate Deviation Principle of speed $v(\varepsilon)\to\infty$ and good rate function $I:X\to[0,\infty]$ if, for any bounded continuous functional $a(\cdot)$ ,

$\limsup_{\varepsilon \to 0} -\frac{1}{v(\varepsilon)} \log \mathbb{E}\exp\left(-v(\varepsilon)a(Z^\varepsilon)\right) \le \inf_\phi [I(\phi) + a(\phi)],$

$\liminf_{\varepsilon \to 0} -\frac{1}{v(\varepsilon)} \log \mathbb{E}\exp\left(-v(\varepsilon)a(Z^\varepsilon)\right) \ge \inf_\phi [I(\phi) + a(\phi)].$

This is equivalent to an LDP-type statement on probabilities: for any measurable set $A$ ,

$-\inf_{\phi \in A^\circ} I(\phi) \le \liminf_{\varepsilon\to 0} \frac{1}{v(\varepsilon)} \log \mathbb{P}(Z^\varepsilon \in A) \le \limsup_{\varepsilon\to 0} \frac{1}{v(\varepsilon)} \log \mathbb{P}(Z^\varepsilon \in A) \le -\inf_{\phi \in \bar{A}} I(\phi).$

The MDP typically concerns the regime where the normalization of the random object scales as $a(\varepsilon)$ with $a(\varepsilon)\to 0$ , $a(\varepsilon)/\sqrt{\varepsilon}\to\infty$ , so the fluctuation is small but dominates the CLT scale. The rate function $I$ is quadratic for classical independent or weakly dependent sequences, but in general it reflects the structure of the underlying stochastic system (Tsirelson, 2018, Tsirelson, 2017, Morse et al., 2016).

2. Archetypal Examples and Scaling Regimes

The MDP interpolation between the LDP (Freidlin–Wentzell) and the CLT is well-illustrated in small-noise SDEs: $dX^\varepsilon_t = b(X^\varepsilon_t) dt + \sqrt{\varepsilon} \sigma(X^\varepsilon_t) dB_t, \quad X^\varepsilon_0 = x_0.$

Large Deviations (LDP): Fluctuations of $X^\varepsilon$ at scale $O(1)$ , rate function $I_{LDP}$ , speed $1/\varepsilon$ .
Central Limit (CLT): Fluctuations of $[X^\varepsilon - x_0]/\sqrt{\varepsilon}$ , Gaussian limit.
Moderate Deviations (MDP): Centered and scaled fluctuation $Y^\varepsilon_t = (X^\varepsilon_t - x_0)/a(\varepsilon)$ , with $a(\varepsilon)\to 0$ , $a(\varepsilon)/\sqrt{\varepsilon}\to\infty$ . The rate function for MDP is then a quadratic form reflecting the local linearization (Ma et al., 2011).

The same paradigm occurs in:

Occupancy processes and empirical statistics in queueing and interacting particle systems (Wang et al., 5 Mar 2025, Feng et al., 31 Oct 2025),
SPDEs and infinite-dimensional diffusions (Fatheddin et al., 2014, Gasteratos et al., 2020, Kumar et al., 2024),
Markov-modulated or multiscale systems with fast and slow components (Morse et al., 2016, Bourguin et al., 2022, Qian, 28 Nov 2025),
Random field integrals and spatial moderate deviations in non-i.i.d. settings (Tsirelson, 2018, Tsirelson, 2019),
Additive functionals of distribution-dependent SDEs (DDSDEs) (Ren et al., 2021),
Sums of martingale differences and branching/Markov tree structures (Penda et al., 2011).

MDP is foundational in information theory, quantifying the error probability decay of codes in the regime where the rate approaches capacity at an intermediate speed (Altug et al., 2012).

3. Variational Rate Functions and Skeleton Equations

Across independent or weakly dependent systems, the MDP rate function $I$ is quadratic, as for sums of i.i.d. variables: $I(x) = \frac{x^2}{2\sigma^2}.$ For Markov processes and SDEs (small-noise, moderate rescaling), the rate function is identified via a stochastic control or skeleton equation: $I(\phi) = \frac{1}{2} \int_0^T \langle Q^{-1}(\dot\phi_s - D b(x_0) \phi_s), (\dot\phi_s - D b(x_0) \phi_s)\rangle ds,$ for absolutely continuous paths with $\phi(0)=0$ , $Q=\sigma\sigma^T$ (Ma et al., 2011, Morse et al., 2016). For infinite-dimensional SPDEs (e.g., 2D Navier-Stokes or Burgers–Huxley), the MDP rate is given in terms of solutions to controlled deterministic PDEs or variational inequalities (Dong et al., 2015, Fatheddin et al., 2014, Kumar et al., 2024).

In multiscale systems, the rate function may involve Poisson equations (invariant measures of the fast process), leading to non-local or degenerate quadratic forms (Morse et al., 2016, Qian, 28 Nov 2025, Bourguin et al., 2022). For Markovian random environments, the rate function can be expressed as a minimum over admissible control and occupation measures, encoding the pathwise skeleton of the system (Qian, 28 Nov 2025).

For empirical statistical functionals (e.g., Pearson’s $\chi^2$ statistic), quadratic rate functions emerge from martingale MDPs (Yu et al., 17 Aug 2025), while for spatial random fields with hierarchical dependence, the quadratic form is retained but only up to a critical (logarithmic) window in the scale of deviations (Tsirelson, 2018, Tsirelson, 2017).

4. Methodological Core: Weak Convergence and Variational Representation

Modern proofs of MDPs uniformly rely on the weak convergence (or so-called Laplace principle) approach:

The Laplace transform $-\frac{1}{v(\varepsilon)} \log \mathbb{E} \exp\left(-v(\varepsilon) a(Z^\varepsilon)\right)$ is represented as the infimum of a stochastic control cost, typically with respect to changes in drift, quadratic cost functional, or exponential tilting (Girsanov).
In infinite-dimensional or non-Markovian settings, controlled processes or occupation measures are introduced; their tightness and identification of limits require delicate dissipation, smoothing, or coupling estimates (Dong et al., 2015, Gasteratos et al., 2020).
Rate function minimizers are constructed explicitly either via solution of deterministic skeleton equations or stochastic control systems with feedback (Morse et al., 2016, Qian, 28 Nov 2025, Tsirelson, 2018).
For fields admitting hierarchical decomposition (splitting/leaks), scale-by-scale recurrence and cumulant generating function expansion are used to obtain quadratic response and establish the MDP via the Gärtner–Ellis theorem (Tsirelson, 2016, Tsirelson, 2017, Tsirelson, 2019).
When applicable, the contraction principle lifts simple MDPs (e.g., of martingales or random walks) to more complex functionals by continuity (Feng et al., 31 Oct 2025, Tsirelson, 2018).

5. Non-classical Phenomena and Structural Insights

A variety of qualitative phenomena arise in the MDP regime far beyond the i.i.d. case:

Memory and Non-Markovianity: In slow–fast stochastic systems driven by fractional Brownian motion, MDP rate functions can be discontinuous in the Hurst parameter at $H=1/2$ . For $H > 1/2$ (long memory), the rate function for moderate deviations shows non-Markovian effects, and Gaussian tail asymptotics differ from the Brownian regime (Bourguin et al., 2022).
Multiscale Averaging and Poisson Structure: For systems with time-scale separation, moderate deviations bridge between Gaussian fluctuation regimes and large deviations, but their rate functions must account for the ergodic properties and Poisson equations associated with averaging or homogenization (Morse et al., 2016, Qian, 28 Nov 2025, Gasteratos et al., 2020).
Spatial and Random Field Hierarchies: For spatially extended random fields with only weak independence (splittability, hierarchical models), MDP holds only at scales up to $o(\sqrt{V}/\log^d V)$ , reflecting the combinatorics of leaks across scales (Tsirelson, 2018, Tsirelson, 2017, Tsirelson, 2019).
Statistical and Information-Theoretic Applications: In information theory, channel coding at rates approaching capacity at moderate speed exhibits error probabilities with exponential asymptotics governed by a universal quadratic rate function, interpolating CLT and LDP error exponents (Altug et al., 2012).
Rare-Event Simulation: The explicit variational structure of MDPs directly informs efficient importance-sampling algorithms for rare events in stochastic simulation, particularly in multiscale and infinite-dimensional settings (Morse et al., 2016).

6. Representative Results Across Domains

Setting	Scaling Regime	Rate Function Structure
Sums of i.i.d., Markov, mixing	$a_n\to\infty$ , $a_n=o(\sqrt{n})$	$I(x)=x^2/(2\sigma^2)$
Small-noise finite-dim SDE	$a(\varepsilon)\to 0$ , $a(\varepsilon)\gg\sqrt{\varepsilon}$	Quadratic form via linearized skeleton eq
Infinite-dim SPDE (e.g., Navier-Stokes/Burgers)	$a(\varepsilon)\to 0$ , $a(\varepsilon)\gg\sqrt{\varepsilon}$	Quadratic rate, via skeleton PDE
Slow–fast diffusions, averaging	$h(\varepsilon)\to\infty$ , $h(\varepsilon)\sqrt{\varepsilon}\to 0$	Quadratic, Poisson Eq, effective coefficients
Jump Markov processes, occupation	$a(\varepsilon)\to\infty$ , $a(\varepsilon)\sqrt{\varepsilon}\to 0$	Controlled martingale/occupation measures
Random fields, hierarchical (splittable)	$a(r)=\sqrt{\mathrm{vol}(B)}/\log^d(\mathrm{vol} B)$	$I(x)=x^2/2C^2$ up to critical scale
Channel coding near capacity	$\epsilon_n\to 0$ , $\epsilon_n\sqrt{n}\to\infty$	$I(x)=x^2/(2\sigma^2(W))$

The table above is illustrative and non-exhaustive, summarized from (Tsirelson, 2018, Tsirelson, 2016, Ma et al., 2011, Morse et al., 2016, Dong et al., 2015, Kumar et al., 2024, Qian, 28 Nov 2025, Ren et al., 2021, Yu et al., 17 Aug 2025, Altug et al., 2012).

7. Significance and Outlook

The MDP framework systematizes the fluctuation theory of random systems beyond Gaussian approximations, with precise Gaussian-type tails for deviations much larger than those captured by classical CLTs yet smaller than full LDPs. It is robust across models—in particular, bridging linear and nonlinear, Markovian/non-Markovian, finite/infinite-dimensional, and multiscale systems.

A key insight is that "moderate" rare events remain governed by local or linearized dynamics even in strongly interacting or non-reversible environments, but the structure of the rate function can reveal nontrivial dependencies, discontinuities, or effective coefficients, especially in the presence of memory or multi-scale effects (Bourguin et al., 2022, Morse et al., 2016).

Current research continues to expand the landscape of MDP theory, especially for interacting particle systems out of equilibrium (Zhao, 2024), mean-field or nonlinear Markov models (Ren et al., 2021), rough-path-driven systems, and spatially extended models with minimal mixing (Tsirelson, 2018, Tsirelson, 2019). The weak convergence and control-theoretic methodology induce constructive approaches to rare-event simulation and quantification of uncertainties across disciplines.