Wasserstein-2 Distance
- Wasserstein-2 Distance is a metric on probability measures with finite second moments, defined via quadratic cost functions to optimally transform one distribution into another.
- Its geometric structure is founded on Brenier's theorem, enabling unique transport maps and geodesic interpolation in the space of probability measures.
- Numerical methods such as linear programming, PDE approaches, and entropic regularization facilitate its application in imaging, statistical estimation, and quantum state analysis.
The Wasserstein-2 distance (), also known as the quadratic Wasserstein or optimal transport distance, is a fundamental metric on the space of probability measures with finite second moments. Originating from the optimal transport problem, quantifies the minimum cost of morphing one probability distribution into another under a quadratic cost function. Its theoretical rigor, metric properties, and computational realizations have motivated extensive research in probability, geometry, statistics, machine learning, and related fields.
1. Formal Definitions and Equivalent Formulations
Let , be Borel probability measures on with finite second moments, i.e., , .
Kantorovich (coupling) formulation:
where is the set of all couplings (joint distributions on with marginals and ) (Snow et al., 2018, Oh et al., 2019, Hertz et al., 19 Dec 2025).
Monge formulation:
where is a measurable transport map and denotes the push-forward (Snow et al., 2018, Korotin et al., 2019).
Dual (Kantorovich) formulation:
where (Snow et al., 2018, Oh et al., 2019).
In particular, the metric metrizes weak convergence plus convergence of second moments (Arras et al., 2016).
2. Geometric and Analytical Structure
The metric equips the space of probability measures with finite second moments with a geodesic metric structure. Brenier's theorem ensures that, under absolute continuity, the optimal plan for is induced by a (unique) map for a convex potential (Snow et al., 2018, Hamm et al., 2023). Displacement interpolation and the dynamic Benamou–Brenier characterization describe geodesics in as pushes of convex combinations of identity and : The dynamic formulation yields
subject to the continuity equation (Hamm et al., 2023).
When restricted to finite-dimensional submanifolds of , the metric inherits pullback Riemannian structures allowing for local linearization and geometric learning (Hamm et al., 2023).
3. Fundamental Properties and Closed-Form Expressions
3.1 Metric Properties
- Nonnegativity:
- Identity of indiscernibles: iff
- Symmetry:
- Triangle inequality:
These establish that is a true metric on (Snow et al., 2018, Korotin et al., 2019, Wang et al., 2024).
3.2 Explicit Solution: Gaussian Measures
For , ,
(Oh et al., 2019, Hertz et al., 19 Dec 2025). For quantum states, a direct analogy exists with similar structure, reducing to the classical formula as (Hertz et al., 19 Dec 2025).
3.3 Shift-Invariant Extension and Decomposition
The relative-translation-invariant Wasserstein-2 () distance is defined by: with Pythagorean decomposition: where are barycenters (Wang et al., 2024).
4. Algorithmic and Statistical Considerations
4.1 Computation
Numerical strategies for include:
- Kantorovich LP: Discrete optimization with cost matrix (Snow et al., 2018).
- Monge–Ampère PDE approaches: For absolutely continuous marginals, reduced to convex potential gradient (Snow et al., 2018).
- Sinkhorn–Knopp Algorithm: Entropic regularization for scalable and smooth approximations, accelerated in via closed-form barycenter updates (Wang et al., 2024).
- RKHS Embedding: Kernelization of through feature space covariance and mean computation (Oh et al., 2019).
- Gradient ICNNs: Learning convex potentials for high-dimensional maps with explicit invertibility guarantees (Korotin et al., 2019).
- Quantum Gaussian Case: Covariance-based formula via symplectic invariants (Hertz et al., 19 Dec 2025).
Computational complexity varies but can be as high as for exact Kantorovich LP and with entropic regularization.
4.2 Empirical Convergence and Statistical Estimation
- Normal approximation (CLT): Stein’s method and Zolotarev ideal metrics bound for locally dependent random variable sums; explicit rates in -dependence, U-statistics, and subgraph counts (Fang, 2018).
- Moment/Cumulant Matching: Explicit bounds for approximating chaos/laws via combinatorial discrepancies (e.g., generalized Stein discrepancy in Wiener chaos) (Arras et al., 2016).
- Sample-based Estimation: Consistency of empirical for manifold recovery in Wasserstein space (Hamm et al., 2023).
5. Extensions and Generalizations
5.1 Mixed Variable and Path-Space Metrics
Generalized metrics accommodate continuous and categorical random fields: with empirical local Wasserstein structure for stochastic neural network training (Xia et al., 7 Jul 2025).
For stochastic processes, the metric extends to trajectory space with time-decoupled and time-coupled functionals, enabling effective SDE parameter recovery (Xia et al., 2024).
5.2 Quantum Wasserstein Distance
Quantum generalizations define via transport over density operators and quadratic quantum cost, preserving operational cost-minimization and closely relating to classical in the appropriate limit (Hertz et al., 19 Dec 2025).
5.3 Manifold Learning in Wasserstein Space
Intrinsic geometry on finite-dimensional submanifolds in supports geodesic restrictions, tangent space estimation, and spectral learning (Hamm et al., 2023).
6. Applications
| Application Domain | Context/Description | Reference |
|---|---|---|
| Image Comparison | Pixelwise and PDE/LP-based transport for MNIST, yielding higher classification accuracy than Euclidean or affine-invariant metrics | (Snow et al., 2018) |
| Medical Imaging | RKHS-kernelized for texture-based clustering of CT slices, outperforming classical OT (Oh et al., 2019) | (Oh et al., 2019) |
| SDE/Dynamical Model Reconstruction | Neural network fitting via -driven loss for stochastic systems, outperforming baseline MMD and likelihood-based losses | (Xia et al., 2024) |
| Domain Adaptation | Input-convex neural network approximation of map for feature alignment | (Korotin et al., 2019) |
| Quantum Information | Quantum Gaussian for state discrimination and metrology | (Hertz et al., 19 Dec 2025) |
| Empirical Law Approximation | Local dependence CLT and chaos approximation in normal and second Wiener chaos | (Fang, 2018, Arras et al., 2016) |
| Manifold and Graph Data | Submanifold recovery and Gromov–Wasserstein consistency using sampled distances | (Hamm et al., 2023) |
7. Ongoing Research and Open Problems
- Efficiency and Scalability: Further optimization of entropic or approximate solvers for large-scale/high-dimensional data remains active (Wang et al., 2024).
- Metric Extensions: New variants such as capture shift-invariant similarity and decompose bias/variance effects in distribution shift contexts (Wang et al., 2024).
- Higher-Order Wasserstein Metrics: Conjectured explicit Berry–Esseen-type bounds for () under local dependence structures (Fang, 2018).
- Universal Approximation in Learning: Demonstrated in the context of generalized , showing end-to-end universal approximation by stochastic neural networks for arbitrarily complex random fields (Xia et al., 7 Jul 2025).
- Quantum and Noncommutative Generalizations: Understanding the full scope of quantum for multimode/non-Gaussian states and its comparison with fidelity, trace distance, or Bures distance (Hertz et al., 19 Dec 2025).
- Spectral and Manifold Methods: Tangent space extraction and learning with in infinite-dimensional measure spaces (Hamm et al., 2023).