Entropic Gradient Flows

Updated 8 February 2026

Entropic gradient flow is defined by using entropy (e.g., Boltzmann–Shannon) as the driving functional to model transport, diffusion, and stochastic phenomena.
It underpins efficient numerical methods such as Sinkhorn iterations in entropic regularized optimal transport and modified JKO schemes that enable parallel computing.
The framework unifies variational, geometric, and probabilistic perspectives, applying to reaction-diffusion, discrete Markov processes, and kinetic equations.

Entropic gradient flow designates a class of gradient flows in which the central driving functional is (a variant of) entropy—typically the Boltzmann–Shannon (Kullback–Leibler) entropy or its relatives—evolved in a geometric structure reflecting underlying transport, reaction, or probabilistic features. Its archetype is the realization of the heat (or Fokker–Planck) equation as the Wasserstein steepest descent of the entropy, but the concept encompasses a variety of systems, including those governed by entropic-regularized optimal transport, mean-field models, reaction/transport equations in unbalanced spaces, discrete Markov processes, and more. Numerical schemes leveraging entropic regularization have revolutionized optimal transport computation and enabled fast, stable implementations on large-scale problems. Entropic gradient flows enjoy robust variational and geometric foundations, admit large-deviations interpretations, and unify stochastic, geometric, and information-theoretic perspectives.

1. Entropic Regularized Optimal Transport and the Entropic JKO Scheme

Let $\mu,\nu$ be probability measures in $\mathcal P_2(\mathbb R^d)$ with cost $c(x,y)=|x-y|^2$ . The standard 2-Wasserstein distance is defined via the Monge–Kantorovich problem, while the entropic regularized optimal transport (also known as the Schrödinger problem) minimizes

$W_{2,\varepsilon}^2(\mu,\nu) = \min_{\pi\in\Pi(\mu,\nu)} \left\{ \int |x-y|^2\,d\pi(x,y) + \varepsilon\,H(\pi|\mu\otimes\nu) \right\}$

with $H(\pi|\mu\otimes\nu)$ the Kullback–Leibler divergence. This regularization renders the problem strictly convex, smooth, and numerically amenable via Sinkhorn's algorithm through iterative scaling of the dual potentials ( $\varphi$ , $\psi$ ) (Carlier et al., 2015).

The classical Jordan–Kinderlehrer–Otto (JKO) scheme realizes gradient flows by implicit Euler steps in the Wasserstein space,

$\rho^{k+1} = \arg\min_{\rho\in \mathcal{P}_2(\mathbb{R}^d)} \left\{ \frac{1}{2\tau}W_2^2(\rho,\rho^k) + F(\rho) \right\}$

where $F$ is a suitable convex functional. The entropic JKO scheme replaces $W_2$ with $W_{2,\varepsilon}$ : $\rho^{k+1}_{\varepsilon,\tau} = \arg\min_{\rho} \left\{ \frac{1}{2\tau}W_{2,\varepsilon}^2(\rho,\rho^k_{\varepsilon,\tau}) + F(\rho) \right\}$ This strictly convex, smooth problem can be efficiently solved with iterative Sinkhorn projections and is highly parallelizable (Carlier et al., 2015, Peyré, 2015).

Convergence of the entropic flow—as $\varepsilon, \tau \to 0$ with $\varepsilon|\log\varepsilon| = O(\tau^2)$ —to the original (unregularized) gradient flow is established via evolutionary $\Gamma$ -convergence. The entropy barrier improves stability, enables massive parallelism, and allows JKO steps that are robust even on complex computational domains (Carlier et al., 2015, Forkert et al., 2020).

2. Geometry, Variational Structure, and Large-Deviation Principles

Entropic gradient flows possess a natural variational structure grounded in generalized Riemannian geometry on the space of probability measures. The Otto calculus formalizes the geometric setting, allowing the identification of the 2-Wasserstein metric with a Riemannian structure whose "energy" is the entropy (Adams et al., 2012, Karatzas et al., 2020). The prototypical equation,

$\partial_t\rho = \nabla\cdot(\rho\,\nabla \delta F/\delta\rho)$

emerges as the steepest descent of $F$ under the Wasserstein metric.

A profound link exists between entropic gradient flows and large-deviation theory. The discrete-time JKO minimization is asymptotically equivalent to the large-deviation rate functional for many-particle systems (e.g., independent Brownian motions), tightly coupling microscopic stochastic fluctuations and macroscopic deterministic evolution (Adams et al., 2010, Adams et al., 2012). Thus, gradient-flow dynamics are identified as zero-action (most probable) curves in the path-space large-deviation principle, unifying probabilistic and variational perspectives.

3. Extensions: Sinkhorn Geometry, Discrete and Nonlocal Settings

Several entropic geometries have been developed to address settings beyond classical transport:

Sinkhorn Divergences: The Sinkhorn divergence (debiased entropic OT) defines a geometry metrizing weak convergence and enables gradient flows via modified JKO schemes (Hardion et al., 18 Nov 2025). The induced geometry can be characterized via a reproducing kernel Hilbert space embedding, and admits efficient minimizing-movement schemes retaining well-posedness, contractivity, and global convergence to minimizers.
Discrete Markov Chains: On finite spaces, a Benamou–Brenier-type nonlocal metric $W$ is constructed so that the law of a reversible Markov chain is the gradient flow of entropy with respect to $W$ (Maas, 2011). The discrete chain thus exactly parallels the Wasserstein-entropy gradient flow of diffusion.
Porous Medium and Reaction Equations: Both in discrete spaces and for nonlinear Markov semigroups, non-Euclidean metrics and suitably chosen convex entropies yield porous-medium-type and reaction-diffusion equations as entropic gradient flows (Erbar et al., 2012, Kondratyev et al., 2017).
Boltzmann and Landau Kinetics: For the spatially homogeneous Boltzmann or Landau equation, a nonlocal Onsager operator or metric is constructed so that the kinetic PDE is the entropy gradient flow in a "collision geometry" or through a nonlocal action (Erbar, 2016, Carrillo et al., 2020).

4. Stochastic Processes, Schrödinger Problems, and Interpolations

Entropic gradient flows connect deeply with stochastic processes:

Schrödinger Bridge & Entropic Interpolation: The Schrödinger problem seeks the most likely (minimum-entropy) path ensemble connecting two marginals under a reference diffusion. The resulting path-space minimizer (Schrödinger bridge) interpolates between marginals via entropic transport, converging in the small-noise limit to classical OT and in the large-horizon limit to heat flow—the entropy gradient flow (Clerc et al., 2020, Chen et al., 2016).
Projected Langevin Dynamics: For entropy-regularized OT between prescribed marginals, the projected Langevin SDE provides a stochastic process constrained to the coupling manifold $\Pi(\mu,\nu)$ whose law evolves via the gradient flow of the entropic cost. Exponential convergence to optimal couplings is achieved under log-Sobolev inequalities (Conforti et al., 2023).
Entropy Dissipation & Inequalities: Along entropic gradient flows in both regular and projected OT geometries, entropy decays at rates governed by projected Fisher information and log-Sobolev constants, offering quantitative convergence guarantees (Karatzas et al., 2020, Conforti et al., 2023).

5. Numerical and Algorithmic Implications

Algorithmically, the principal benefit of entropic regularization is the reduction of high-dimensional, non-smooth transport problems to strictly convex, smooth optimization solvable by matrix scaling (Sinkhorn) or Dykstra's iterative projections (Carlier et al., 2015, Peyré, 2015). Each JKO/entropic step reduces to alternating KL projections, massively accelerating computations on grids, meshes, or graphs and allowing for GPU parallelization. For regular meshes and Gibbs kernels (Gaussian heat kernels), each scaling step costs $O(N)$ (Peyré, 2015).

Eulerian implementations of the Sinkhorn-JKO scheme enjoy energy dissipation, global contractivity, and "teleportation" effects for mass transfer across barriers, unattainable in traditional $W_2$ -gradient flows (Hardion et al., 18 Nov 2025). The convergence of SJKO schemes to continuous flows can be established via monotonicity and maximal monotone operator theory in RKHS embeddings.

In discrete Markov and network settings, analogues of the JKO scheme can be devised using nonlocal transport metrics tailored to the generator, achieving convergence to continuum diffusivities under mesh refinement (Erbar et al., 2012, Forkert et al., 2020).

6. Generalizations, Function Spaces, and Future Directions

The entropic gradient flow paradigm generalizes to:

Fisher–Rao Geometry: In mean-field games and min-max optimization with entropic regularization, the Fisher–Rao metric underlies birth–death flows that guarantee exponential convergence and admit explicit Lyapunov functions (Lascu et al., 2024).
Hellinger–Kantorovich Geometry: Nonlinear Fokker–Planck and reaction-diffusion equations are entropic gradient flows in an unbalanced OT metric, crucial for models involving creation/annihilation of mass (Kondratyev et al., 2017).
Curves, Image Processing, Quantum Gravity: Entropic action functionals founded on quantum or information-theoretic relative entropy give rise to gradient flows in geometric settings, such as flows of planar curves or the anisotropic diffusion (Perona–Malik) in image processing (Bianconi, 18 Mar 2025, O'Donnell et al., 2021).

Future research encompasses further abstraction to generalized dissipation potentials (beyond quadratic), connection with control theory, stochastic processes with constraints, and non-equilibrium thermodynamics, including recent formulations identifying entropy as a Noether charge—invariant under continuous symmetries in the Hamiltonian action, unifying reversibility and macroscopic conservation (Beyen et al., 2024).

7. Summary Table: Principal Entropic Gradient Flows and Geometries

Setting / PDE	Metric / Geometry	Driving Functional	Reference
Heat / Fokker–Planck ( $\partial_t\rho = \Delta\rho$ )	$W_2$ (Wasserstein)	Entropy $H(\rho)$	(Carlier et al., 2015, Adams et al., 2012, Karatzas et al., 2020)
Entropic OT / Sinkhorn-JKO	Sinkhorn divergence	$E(\mu) + D_\varepsilon$	(Hardion et al., 18 Nov 2025, Peyré, 2015)
Discrete Markov chains	Discrete transport metric	Entropy H	(Maas, 2011)
Boltzmann / Landau equations	Collision/Landau distance	Boltzmann entropy	(Erbar, 2016, Carrillo et al., 2020)
Hellinger–Kantorovich reaction-diffusion	HK metric (unbalanced OT)	Free energy $\mathcal E$	(Kondratyev et al., 2017)
Fisher–Rao min-max games	Fisher–Rao (information)	Relative entropy $D_{KL}$	(Lascu et al., 2024)
Perona–Malik, quantum gravity	Info-geometry (operator)	Quantum rel. entropy	(Bianconi, 18 Mar 2025)

The unifying principle of entropic gradient flow is the identification of a suitable entropy-related functional as the gradient flow generator in a Riemannian or sub-Riemannian metric structure determined by the relevant physical, probabilistic, or geometric constraints. This enables a general framework for evolution equations, variational approximation schemes, and efficient algorithms with strong theoretical guarantees for convergence and stability across diverse settings (Carlier et al., 2015, Hardion et al., 18 Nov 2025, Forkert et al., 2020, Adams et al., 2010, Adams et al., 2012, Beyen et al., 2024).