Gradient Flow and Particle Models

Updated 15 February 2026

Gradient Flow and Particle Models is a framework where probability densities evolve as a steepest descent of energy functionals in metric spaces.
Particle models approximate these flows by tracking interacting particles, preserving variational structure and ensuring convergence.
Recent methodologies integrate neural networks and advanced numerical schemes to enhance scalability and accuracy in high-dimensional applications.

Gradient flows describe the dynamical evolution of probability densities as steepest descent in the space of measures endowed with a chosen metric. Particle models are finite-dimensional approximations that represent such flows by the trajectories of interacting particles, enabling both analysis and high-dimensional computation. This article synthesizes foundational formulations, Lagrangian and Eulerian perspectives, central particle schemes (notably those based on the Jordan–Kinderlehrer–Otto (JKO) and generalized flows), convergence properties, and advanced neural-network-empowered and scalable methodologies for high-dimensional problems.

1. Variational Gradient Flow Formulations

The prototypical mathematical framework consists of a free energy functional on densities, for instance

$F(\rho) = \int_{\mathbb{R}^d} U(\rho(x))\,dx + \int_{\mathbb{R}^d} V(x) \rho(x)\,dx + \frac12 \int_{\mathbb{R}^d\times\mathbb{R}^d} W(x,y)\rho(x)\rho(y)\,dx\,dy,$

where $U$ encodes internal (e.g., entropic or porous media) energy, $V$ is a potential, and $W$ describes interaction.

The archetype of implicit variational time-discretization is the JKO scheme: $\rho_{k+1} = \arg\min_{\rho \in \mathcal{P}(\mathbb{R}^d)} \left\{ \frac{1}{2\tau} W_2^2(\rho, \rho_k) + F(\rho) \right\},$ where $W_2$ is the $2$-Wasserstein distance, and $\tau$ is the time step (Lee et al., 2023).

Gradient flows on $\mathcal{P}_p(\mathbb{R})$ for $p>1$ are governed by doubly nonlinear diffusion (for $H$ convex, superlinear), yielding PDEs such as

$\partial_t \rho = \partial_x \Big(|\partial_x H'(\rho)|^{q-2} \partial_x H'(\rho)\Big),$

with $q$ the dual exponent (Lei, 7 Jan 2025).

Particle models are constructed to discretize these flows while preserving variational structure at the discrete level.

2. Lagrangian Reformulation and Particle Models

Instead of directly approximating densities, one pushes forward a set of particles along a transport induced by a velocity field. In the Lagrangian picture, let the transport map $T(t, x)$ evolve under

$\frac{d}{dt} T(t, x) = v(t, T(t, x)), \quad T(0,x) = x,$

with the evolved density given by the pushforward $\rho(t) = T(t, \cdot)_\# \rho_k$ (Lee et al., 2023).

The evolution of the determinant of the Jacobian relates to the divergence: $\frac{d}{dt} \log\det \nabla_x T(t, x) = \operatorname{div} v\big(t, T(t, x)\big).$ For practical reasons, the velocity $v(t, x)$ is often parameterized directly or as the gradient of a neural-network-parameterized potential $\phi_\theta$ , so that $v_\theta = \nabla_x \phi_\theta(t,x)$ .

Particle models also arise from geometric discretizations, such as nonoverlapping balls or Laguerre (weighted Voronoi) cells, which grant explicit expressions for the induced discrete energy and its gradients (Lei, 7 Jan 2025, Natale, 2023). For non-overlapping balls in 1D: $E_N(x_1, \ldots, x_N) = \sum_{i=1}^N |B_i| H\left(\frac{1}{N |B_i|}\right),$ with $B_i$ the cell associated to $x_i$ .

Gradient flows for these energies translate into ODEs on particle positions: $\dot{x}_i = -\partial_{x_i} E_N(x_1,\ldots,x_N),$ or, more generally, via the duality mapping for general $p$ -geometry (Lei, 7 Jan 2025).

3. Algorithmic Realizations and Loss Functions

A significant class of contemporary particle methods adopt neural network-based parameterizations and neural ODE frameworks (Lee et al., 2023, Cheng et al., 2023, Dong et al., 2022, Zhang et al., 2024). For implicit-in-time JKO steps:

Draw $N$ samples from $\rho_k$ .
Evolve these under forward Euler or higher-order ODE solvers, simultaneously updating density using instantaneous Jacobian determinants.
The loss function (for linear mobility) is

$\mathcal{L}(\theta) = \frac{1}{N} \sum_{j=1}^N \left[ \sum_{k=0}^{N_t-1} \Delta t_{\rm in} |\nabla_x \phi_\theta(t_k, z_j^k)|^2 + 2\tau \left( \frac{U(\rho_{k+1}(z_j^{N_t}))}{\rho_{k+1}(z_j^{N_t})} + V(z_j^{N_t}) \right) + \frac{2\tau}{N}\sum_{l=1}^N W(z_j^{N_t}, z_l^{N_t}) \right].$

Update $\theta$ (neural network parameters) via stochastic gradient descent or Adam.

For generalized Wasserstein geometries (Cheng et al., 2023), the gradient flow of the Kullback–Leibler divergence in a $p$ -norm produces: $dx_t = \nabla g^*(\nabla \log \pi(x_t) - \nabla \log \rho_t(x_t))\,dt,$ where $g^*$ is the Legendre conjugate of the regularizer, allowing for adaptive and structural choices in the induced geometry.

Particle approximations represent the density by $\{x_i^k\}$ , and velocity fields are learned via neural nets or kernelized function classes (as in SVGD, PFG, and Radon–Wasserstein flows) (Hess-Childs et al., 5 Feb 2026, Dong et al., 2022, Cheng et al., 2023, Liu, 2017).

4. Analytical Properties and Convergence Results

Particle models can be shown to converge to continuum flows under suitable scaling and regularity conditions:

Gamma-convergence of the discrete energy (for nonoverlapping balls or Voronoi volumes) to its continuum counterpart (Lei, 7 Jan 2025).
Serfaty's framework for convergence of gradient flows in metric spaces is key for rigorously passing from particle ODEs to the Wasserstein gradient flow PDE (Lei, 7 Jan 2025).
Energy dissipation is preserved at the particle level (Lee et al., 2023, Carrillo et al., 2015).
For strong convexity or log-concavity scenarios, explicit exponential rates can be established (Caprio et al., 2024, Kuntz et al., 2022).

Limitations include extension of convexity-based proofs to higher dimensions, the requirement of well-prepared initial data (e.g., uniformity in spacing), and the lack of sharp, quantitative rates in most nontrivial settings.

5. Nonlinear Mobility, Regularization, and High-Dimensional Flows

Gradient flows with nonlinear mobility $M(\rho)$ or in non-Euclidean geometries expand the applicability of particle models: $\partial_t \rho = \nabla \cdot \left( M(\rho) \nabla \frac{\delta E}{\delta\rho} \right ),$ leading to modified variational costs and weighted particle motion in Lagrangian frameworks (Lee et al., 2023, Carrillo et al., 2015).

Alternative regularizations, such as preconditioned or functional flows, enable adaptation to ill-conditioning and high-dimensional state spaces. The incorporation of neural architectures in velocity parameterizations or score networks is critical to achieving scalability and approximation power, avoiding kernel methods' curse of dimensionality (Dong et al., 2022, Lee et al., 2023, Hess-Childs et al., 5 Feb 2026).

Radon–Wasserstein gradient flows impose a geometry where velocities depend only on 1D projections, yielding algorithms with $O(nd)$ complexity per step rather than $O(n^2 d)$ (as in SVGD), thus enabling tractable high-dimensional sampling (Hess-Childs et al., 5 Feb 2026).

6. Applications and Extensions

Gradient flow particle models have been applied and benchmarked in:

Fokker–Planck and aggregation-diffusion equations (e.g., porous medium, Kalman–Wasserstein flows),
Nonlocal interaction models (including noisy nonlocal aggregation and biological/collective systems) (Yang et al., 3 Feb 2026),
Variational inference (ParVI, GWG, SIFG, PFG),
Bayesian inverse problems, Bayesian neural networks, and latent variable EM (Lee et al., 2023, Cheng et al., 2023, Zhang et al., 2024, Kuntz et al., 2022),
Generative modeling (score-based diffusion, GANs as particle flows) (Franceschi et al., 2023).

Particle schemes reliably capture mass-preservation, energy decay, metastability (including Arrhenius and Eyring–Kramers rates), entropy-regularized effects (degenerate LSI/PLI), and formation of singularities or clusters under sticky dynamics (Monmarché, 18 Oct 2025, Galtung, 2024).

A practical summary of methods and their core features:

Method	Geometry / Regularizer	Scalability	Neuralization	Convergence Guarantee
Deep JKO (Lee et al., 2023)	Wasserstein-2/Nonlinear mob.	$O(N d^2)$	Yes	Yes, unconditional energy decay
SVGD (Liu, 2017)	RKHS (Stein operator)	$O(N^2 d)$	Yes	KL descent, weak convergence
GWG (Cheng et al., 2023)	General $p$ -Wasserstein	$O(N d)$	Yes	Strong, rate-matched to Langevin
PFG (Dong et al., 2022)	Data-adaptive preconditioning	$O(N d)$	Yes	Linear KL decay, ill-conditioned
Radon-Wass. (Hess-Childs et al., 5 Feb 2026)	1D-projection-based	$O(N d)$	No	Well-posed, mean-field limit
SIFG (Zhang et al., 2024)	Semi-implicit Gaussian family	$O(N d)$	Yes	Non-asymptotic, adaptive noise

Extensions and active research questions include quantitative rates in multi-dimensional setups, structure-preserving discretization for broader families, extension of convergence frameworks to flows on manifolds or non-Euclidean spaces, and scalable implementations for large particle numbers and high dimensionality (Lee et al., 2023, Lei, 7 Jan 2025, Hess-Childs et al., 5 Feb 2026).