Wasserstein-2 Distance

Updated 23 January 2026

Wasserstein-2 Distance is a metric on probability measures with finite second moments, defined via quadratic cost functions to optimally transform one distribution into another.
Its geometric structure is founded on Brenier's theorem, enabling unique transport maps and geodesic interpolation in the space of probability measures.
Numerical methods such as linear programming, PDE approaches, and entropic regularization facilitate its application in imaging, statistical estimation, and quantum state analysis.

The Wasserstein-2 distance ( $W_2$ ), also known as the quadratic Wasserstein or $L^2$ optimal transport distance, is a fundamental metric on the space of probability measures with finite second moments. Originating from the optimal transport problem, $W_2$ quantifies the minimum cost of morphing one probability distribution into another under a quadratic cost function. Its theoretical rigor, metric properties, and computational realizations have motivated extensive research in probability, geometry, statistics, machine learning, and related fields.

1. Formal Definitions and Equivalent Formulations

Let $\mu$ , $\nu$ be Borel probability measures on $\mathbb{R}^d$ with finite second moments, i.e., $\int \|x\|^2\,d\mu(x)<\infty$ , $\int\|y\|^2\,d\nu(y)<\infty$ .

Kantorovich (coupling) formulation:

$W_2^2(\mu,\nu) = \inf_{\pi\in\Pi(\mu,\nu)} \int_{\mathbb{R}^d\times\mathbb{R}^d}\|x-y\|^2\,d\pi(x,y)$

where $\Pi(\mu,\nu)$ is the set of all couplings (joint distributions on $\mathbb{R}^d\times\mathbb{R}^d$ with marginals $\mu$ and $\nu$ ) (Snow et al., 2018, Oh et al., 2019, Hertz et al., 19 Dec 2025).

Monge formulation:

$W_2^2(\mu,\nu) = \inf_{T: T_\#\mu=\nu} \int_{\mathbb{R}^d} \|x-T(x)\|^2\,d\mu(x)$

where $T$ is a measurable transport map and $T_\#\mu=\nu$ denotes the push-forward (Snow et al., 2018, Korotin et al., 2019).

Dual (Kantorovich) formulation:

$W_2^2(\mu,\nu) = \sup_{u\in C_b(\mathbb{R}^d)} \left\{ \int u(x)\,d\mu(x) + \int u^{c}(y)\,d\nu(y) \right\}$

where $u^c(y)=\inf_x \{\|x-y\|^2 - u(x)\}$ (Snow et al., 2018, Oh et al., 2019).

In particular, the $W_2$ metric metrizes weak convergence plus convergence of second moments (Arras et al., 2016).

2. Geometric and Analytical Structure

The $W_2$ metric equips the space $\mathcal{P}_2(\mathbb{R}^d)$ of probability measures with finite second moments with a geodesic metric structure. Brenier's theorem ensures that, under absolute continuity, the optimal plan for $W_2$ is induced by a (unique) map $T = \nabla \phi$ for a convex potential $\phi$ (Snow et al., 2018, Hamm et al., 2023). Displacement interpolation and the dynamic Benamou–Brenier characterization describe geodesics in $\mathcal{P}_2(\mathbb{R}^d)$ as pushes of convex combinations of identity and $T$ : $\mu_t = \big( (1-t)\operatorname{Id} + t T \big)_\# \mu \qquad (t\in[0,1])$ The dynamic formulation yields

$W_2(\mu,\nu) = \inf_{(\rho,v)}\left(\int_0^1 \|v(t)\|_{L^2(\rho(t))}^2 dt\right)^{1/2}$

subject to the continuity equation $\partial_t\rho + \nabla\cdot(\rho v)=0$ (Hamm et al., 2023).

When restricted to finite-dimensional submanifolds of $\mathcal{P}_{\mathrm{a.c.}}(\Omega)$ , the metric inherits pullback Riemannian structures allowing for local linearization and geometric learning (Hamm et al., 2023).

3. Fundamental Properties and Closed-Form Expressions

3.1 Metric Properties

Nonnegativity: $W_2^2(\mu,\nu)\ge 0$
Identity of indiscernibles: $W_2(\mu,\nu)=0$ iff $\mu=\nu$
Symmetry: $W_2(\mu,\nu)=W_2(\nu,\mu)$
Triangle inequality: $W_2(\mu,\nu)\le W_2(\mu,\lambda) + W_2(\lambda,\nu)$

These establish that $W_2$ is a true metric on $\mathcal{P}_2(\mathbb{R}^d)$ (Snow et al., 2018, Korotin et al., 2019, Wang et al., 2024).

3.2 Explicit Solution: Gaussian Measures

For $\mu = \mathcal{N}(m_1,C_1)$ , $\nu = \mathcal{N}(m_2,C_2)$ ,

$W_2^2(\mu,\nu) = \|m_1 - m_2\|^2 + \operatorname{tr}(C_1 + C_2 - 2 (C_1^{1/2} C_2 C_1^{1/2})^{1/2})$

(Oh et al., 2019, Hertz et al., 19 Dec 2025). For quantum states, a direct analogy exists with similar structure, reducing to the classical formula as $\hbar\to 0$ (Hertz et al., 19 Dec 2025).

3.3 Shift-Invariant Extension and Decomposition

The relative-translation-invariant Wasserstein-2 ( $RW_2$ ) distance is defined by: $RW_2^2(\mu, \nu) = \inf_{s\in \mathbb{R}^n} W_2^2(\mu, (T_s)_\sharp \nu)$ with Pythagorean decomposition: $W_2^2(\mu, \nu) = RW_2^2(\mu, \nu) + \|\bar{\mu} - \bar{\nu}\|^2$ where $\bar{\mu}, \bar{\nu}$ are barycenters (Wang et al., 2024).

4. Algorithmic and Statistical Considerations

4.1 Computation

Numerical strategies for $W_2$ include:

Kantorovich LP: Discrete optimization with cost matrix $c_{ij} = \|x_i - y_j\|^2$ (Snow et al., 2018).
Monge–Ampère PDE approaches: For absolutely continuous marginals, reduced to convex potential gradient (Snow et al., 2018).
Sinkhorn–Knopp Algorithm: Entropic regularization for scalable and smooth approximations, accelerated in $RW_2$ via closed-form barycenter updates (Wang et al., 2024).
RKHS Embedding: Kernelization of $W_2$ through feature space covariance and mean computation (Oh et al., 2019).
Gradient ICNNs: Learning convex potentials for high-dimensional maps with explicit invertibility guarantees (Korotin et al., 2019).
Quantum Gaussian Case: Covariance-based formula via symplectic invariants (Hertz et al., 19 Dec 2025).

Computational complexity varies but can be as high as $O(n^3)$ for exact Kantorovich LP and $O(n^2)$ with entropic regularization.

4.2 Empirical Convergence and Statistical Estimation

Normal approximation (CLT): Stein’s method and Zolotarev ideal metrics bound $W_2$ for locally dependent random variable sums; explicit rates in $m$ -dependence, U-statistics, and subgraph counts (Fang, 2018).
Moment/Cumulant Matching: Explicit bounds for approximating chaos/laws via combinatorial discrepancies (e.g., generalized Stein discrepancy in Wiener chaos) (Arras et al., 2016).
Sample-based Estimation: Consistency of empirical $W_2$ for manifold recovery in Wasserstein space (Hamm et al., 2023).

5. Extensions and Generalizations

5.1 Mixed Variable and Path-Space Metrics

Generalized $W_2$ metrics accommodate continuous and categorical random fields: $\|y\|^2 = \lambda \sum_{i=1}^{d_1} y_i^2 + \sum_{j=d_1+1}^d \hat{\delta}_{y_j,0}$ with empirical local Wasserstein structure for stochastic neural network training (Xia et al., 7 Jul 2025).

For stochastic processes, the $W_2$ metric extends to trajectory space with time-decoupled and time-coupled functionals, enabling effective SDE parameter recovery (Xia et al., 2024).

5.2 Quantum Wasserstein Distance

Quantum generalizations define $W_2$ via transport over density operators and quadratic quantum cost, preserving operational cost-minimization and closely relating to classical $W_2$ in the appropriate limit (Hertz et al., 19 Dec 2025).

5.3 Manifold Learning in Wasserstein Space

Intrinsic geometry on finite-dimensional submanifolds in $\mathcal{P}_2(\Omega)$ supports geodesic restrictions, tangent space estimation, and spectral learning (Hamm et al., 2023).

6. Applications

Application Domain	Context/Description	Reference
Image Comparison	Pixelwise $W_2$ and PDE/LP-based transport for MNIST, yielding higher classification accuracy than Euclidean or affine-invariant metrics	(Snow et al., 2018)
Medical Imaging	RKHS-kernelized $W_2$ for texture-based clustering of CT slices, outperforming classical OT (Oh et al., 2019)	(Oh et al., 2019)
SDE/Dynamical Model Reconstruction	Neural network fitting via $W_2$ -driven loss for stochastic systems, outperforming baseline MMD and likelihood-based losses	(Xia et al., 2024)
Domain Adaptation	Input-convex neural network approximation of $W_2$ map for feature alignment	(Korotin et al., 2019)
Quantum Information	Quantum Gaussian $W_2$ for state discrimination and metrology	(Hertz et al., 19 Dec 2025)
Empirical Law Approximation	Local dependence CLT and chaos approximation in normal and second Wiener chaos	(Fang, 2018, Arras et al., 2016)
Manifold and Graph Data	Submanifold recovery and Gromov–Wasserstein consistency using sampled $W_2$ distances	(Hamm et al., 2023)

7. Ongoing Research and Open Problems

Efficiency and Scalability: Further optimization of entropic or approximate $W_2$ solvers for large-scale/high-dimensional data remains active (Wang et al., 2024).
Metric Extensions: New variants such as $RW_2$ capture shift-invariant similarity and decompose bias/variance effects in distribution shift contexts (Wang et al., 2024).
Higher-Order Wasserstein Metrics: Conjectured explicit Berry–Esseen-type bounds for $W_p$ ( $p>2$ ) under local dependence structures (Fang, 2018).
Universal Approximation in Learning: Demonstrated in the context of generalized $W_2$ , showing end-to-end universal approximation by stochastic neural networks for arbitrarily complex random fields (Xia et al., 7 Jul 2025).
Quantum and Noncommutative Generalizations: Understanding the full scope of quantum $W_2$ for multimode/non-Gaussian states and its comparison with fidelity, trace distance, or Bures distance (Hertz et al., 19 Dec 2025).
Spectral and Manifold Methods: Tangent space extraction and learning with $W_2$ in infinite-dimensional measure spaces (Hamm et al., 2023).