Wasserstein Gradient Flows (WGF)
- Wasserstein Gradient Flows are continuous-time dynamical systems that characterize the steepest descent evolution of functionals over probability measures via the 2-Wasserstein metric.
- Discrete-time schemes like the JKO and forward–backward splitting methods provide practical approximations with provable convergence rates under convexity assumptions.
- This framework bridges optimal transport, PDE analysis, and machine learning, enabling rigorous and scalable optimization in infinite-dimensional spaces.
Wasserstein Gradient Flows (WGF) are continuous-time dynamical systems that characterize the steepest descent evolution of a functional over the space of probability measures endowed with the 2-Wasserstein metric. The WGF framework provides a rigorous, geometrically-intrinsic generalization of gradient descent to infinite-dimensional spaces, with foundational relevance across optimal transport, partial differential equations, and probabilistic machine learning.
1. The 2-Wasserstein Space: Metric, Geometry, and Geodesics
The space of Borel probability measures on ℝᵈ with finite second moments,
equipped with the 2-Wasserstein distance,
becomes a geodesic metric space, where is the set of couplings of μ and ν. When μ is absolutely continuous, the optimal transport map is given by the gradient of a convex function (Brenier's theorem), and constant-speed geodesics can be constructed as pushforwards via interpolated maps: for t∈[0,1], where is the optimal transport map. The geodesic structure is central for defining “steepest descent” in this space (Salim et al., 2020).
2. Continuous-Time Formulation: Evolution Equation and Variational Characterization
For a given functional , the curve solving the Wasserstein gradient flow is characterized by the Evolution Variational Inequality (EVI):
Under regularity conditions, this is equivalent to a PDE for the density :
where denotes the first variation of . For example, if , the gradient flow yields the Fokker-Planck equation, a prototypical diffusive evolution (Salim et al., 2020).
3. Discrete-Time Schemes: JKO and Forward-Backward Splitting
The canonical time-discretization of WGF is the Jordan–Kinderlehrer–Otto (JKO) implicit Euler scheme:
This yields a sequence whose piecewise-constant interpolation converges to the continuous WGF as .
When the objective function decomposes as with smooth and possibly nonsmooth but geodesically convex, the Forward–Backward (FB) proximal-gradient algorithm over is defined as:
- Forward (gradient) step for : ,
- Backward (proximal) step for : ,
mirroring the classical Euclidean proximal-point framework. Here, is a JKO step for only (Salim et al., 2020).
4. Convergence Theory for Proximal Splitting and Rates
Suppose is -smooth and -strongly convex, and is proper, lower semicontinuous, and convex along generalized geodesics. If , the FB scheme satisfies a discrete EVI:
- If , .
- If , (linear convergence).
This result establishes WGF-FB as an infinite-dimensional analog of the proximal gradient method, retaining convergence guarantees familiar from convex Euclidean optimization (Salim et al., 2020).
5. Practical Implementation, Computational Aspects, and Examples
Continuous-time WGF enjoys exact decay rates, while discrete-time schemes (JKO, FB) match these rates up to step-size constraints. The main numerical challenge is evaluating the proximal map (JKO subproblem), which, depending on , may admit:
- Closed-form solutions (e.g., negative entropy/heat flow),
- PDE-based solvers (for more complex energies),
- Entropic regularization or Sinkhorn algorithms for approximation.
FB splitting reduces the implicit computation to the part only, with the part handled by a simple push-forward. In the canonical quadratic-plus-entropy example (sampling from a Gaussian), each FB step maintains Gaussianity, and closed-form recursions for mean and covariance yield linear W₂-convergence. Particle-based (sample-wise) push-forward strategies with optional heat flow accurately reflect continuous-time contraction, even in high dimensions (Salim et al., 2020).
6. Extensions, Applications, and Open Directions
The FB splitting framework for Wasserstein gradient flows enables:
- Handling composite objectives with both smooth and nonsmooth contributions,
- Direct generalization from Euclidean optimization,
- Provable convergence under geodesic convexity,
- Scalability to high dimensions when approximate or closed-form JKO operators are available.
Ongoing research targets efficient algorithms for more general energy landscapes (including non-convex energies, non-Euclidean underlying domains), adaptive schemes, high-dimensional and large-scale applications, and connections to stochastic optimization and sampling (Salim et al., 2020).
Table: Summary of Classical vs. Proximal-Splitting WGF Schemes
| Method | Iteration Definition | Complexity per Step |
|---|---|---|
| JKO | Full proximal (often hard/expensive) | |
| FB-Splitting | First pushforward by , then Prox | Cheaper: only Prox |
The Wasserstein Proximal Gradient framework thus defines and analyzes an efficient and theoretically well-founded approach to composite optimization over the space of measures, with direct applicability to variational inference, sampling, and PDE evolution models (Salim et al., 2020).