Papers
Topics
Authors
Recent
Search
2000 character limit reached

Wasserstein-2 Distance

Updated 23 January 2026
  • Wasserstein-2 Distance is a metric on probability measures with finite second moments, defined via quadratic cost functions to optimally transform one distribution into another.
  • Its geometric structure is founded on Brenier's theorem, enabling unique transport maps and geodesic interpolation in the space of probability measures.
  • Numerical methods such as linear programming, PDE approaches, and entropic regularization facilitate its application in imaging, statistical estimation, and quantum state analysis.

The Wasserstein-2 distance (W2W_2), also known as the quadratic Wasserstein or L2L^2 optimal transport distance, is a fundamental metric on the space of probability measures with finite second moments. Originating from the optimal transport problem, W2W_2 quantifies the minimum cost of morphing one probability distribution into another under a quadratic cost function. Its theoretical rigor, metric properties, and computational realizations have motivated extensive research in probability, geometry, statistics, machine learning, and related fields.

1. Formal Definitions and Equivalent Formulations

Let μ\mu, ν\nu be Borel probability measures on Rd\mathbb{R}^d with finite second moments, i.e., x2dμ(x)<\int \|x\|^2\,d\mu(x)<\infty, y2dν(y)<\int\|y\|^2\,d\nu(y)<\infty.

Kantorovich (coupling) formulation:

W22(μ,ν)=infπΠ(μ,ν)Rd×Rdxy2dπ(x,y)W_2^2(\mu,\nu) = \inf_{\pi\in\Pi(\mu,\nu)} \int_{\mathbb{R}^d\times\mathbb{R}^d}\|x-y\|^2\,d\pi(x,y)

where Π(μ,ν)\Pi(\mu,\nu) is the set of all couplings (joint distributions on Rd×Rd\mathbb{R}^d\times\mathbb{R}^d with marginals μ\mu and ν\nu) (Snow et al., 2018, Oh et al., 2019, Hertz et al., 19 Dec 2025).

Monge formulation:

W22(μ,ν)=infT:T#μ=νRdxT(x)2dμ(x)W_2^2(\mu,\nu) = \inf_{T: T_\#\mu=\nu} \int_{\mathbb{R}^d} \|x-T(x)\|^2\,d\mu(x)

where TT is a measurable transport map and T#μ=νT_\#\mu=\nu denotes the push-forward (Snow et al., 2018, Korotin et al., 2019).

Dual (Kantorovich) formulation:

W22(μ,ν)=supuCb(Rd){u(x)dμ(x)+uc(y)dν(y)}W_2^2(\mu,\nu) = \sup_{u\in C_b(\mathbb{R}^d)} \left\{ \int u(x)\,d\mu(x) + \int u^{c}(y)\,d\nu(y) \right\}

where uc(y)=infx{xy2u(x)}u^c(y)=\inf_x \{\|x-y\|^2 - u(x)\} (Snow et al., 2018, Oh et al., 2019).

In particular, the W2W_2 metric metrizes weak convergence plus convergence of second moments (Arras et al., 2016).

2. Geometric and Analytical Structure

The W2W_2 metric equips the space P2(Rd)\mathcal{P}_2(\mathbb{R}^d) of probability measures with finite second moments with a geodesic metric structure. Brenier's theorem ensures that, under absolute continuity, the optimal plan for W2W_2 is induced by a (unique) map T=ϕT = \nabla \phi for a convex potential ϕ\phi (Snow et al., 2018, Hamm et al., 2023). Displacement interpolation and the dynamic Benamou–Brenier characterization describe geodesics in P2(Rd)\mathcal{P}_2(\mathbb{R}^d) as pushes of convex combinations of identity and TT: μt=((1t)Id+tT)#μ(t[0,1])\mu_t = \big( (1-t)\operatorname{Id} + t T \big)_\# \mu \qquad (t\in[0,1]) The dynamic formulation yields

W2(μ,ν)=inf(ρ,v)(01v(t)L2(ρ(t))2dt)1/2W_2(\mu,\nu) = \inf_{(\rho,v)}\left(\int_0^1 \|v(t)\|_{L^2(\rho(t))}^2 dt\right)^{1/2}

subject to the continuity equation tρ+(ρv)=0\partial_t\rho + \nabla\cdot(\rho v)=0 (Hamm et al., 2023).

When restricted to finite-dimensional submanifolds of Pa.c.(Ω)\mathcal{P}_{\mathrm{a.c.}}(\Omega), the metric inherits pullback Riemannian structures allowing for local linearization and geometric learning (Hamm et al., 2023).

3. Fundamental Properties and Closed-Form Expressions

3.1 Metric Properties

  • Nonnegativity: W22(μ,ν)0W_2^2(\mu,\nu)\ge 0
  • Identity of indiscernibles: W2(μ,ν)=0W_2(\mu,\nu)=0 iff μ=ν\mu=\nu
  • Symmetry: W2(μ,ν)=W2(ν,μ)W_2(\mu,\nu)=W_2(\nu,\mu)
  • Triangle inequality: W2(μ,ν)W2(μ,λ)+W2(λ,ν)W_2(\mu,\nu)\le W_2(\mu,\lambda) + W_2(\lambda,\nu)

These establish that W2W_2 is a true metric on P2(Rd)\mathcal{P}_2(\mathbb{R}^d) (Snow et al., 2018, Korotin et al., 2019, Wang et al., 2024).

3.2 Explicit Solution: Gaussian Measures

For μ=N(m1,C1)\mu = \mathcal{N}(m_1,C_1), ν=N(m2,C2)\nu = \mathcal{N}(m_2,C_2),

W22(μ,ν)=m1m22+tr(C1+C22(C11/2C2C11/2)1/2)W_2^2(\mu,\nu) = \|m_1 - m_2\|^2 + \operatorname{tr}(C_1 + C_2 - 2 (C_1^{1/2} C_2 C_1^{1/2})^{1/2})

(Oh et al., 2019, Hertz et al., 19 Dec 2025). For quantum states, a direct analogy exists with similar structure, reducing to the classical formula as 0\hbar\to 0 (Hertz et al., 19 Dec 2025).

3.3 Shift-Invariant Extension and Decomposition

The relative-translation-invariant Wasserstein-2 (RW2RW_2) distance is defined by: RW22(μ,ν)=infsRnW22(μ,(Ts)ν)RW_2^2(\mu, \nu) = \inf_{s\in \mathbb{R}^n} W_2^2(\mu, (T_s)_\sharp \nu) with Pythagorean decomposition: W22(μ,ν)=RW22(μ,ν)+μˉνˉ2W_2^2(\mu, \nu) = RW_2^2(\mu, \nu) + \|\bar{\mu} - \bar{\nu}\|^2 where μˉ,νˉ\bar{\mu}, \bar{\nu} are barycenters (Wang et al., 2024).

4. Algorithmic and Statistical Considerations

4.1 Computation

Numerical strategies for W2W_2 include:

  • Kantorovich LP: Discrete optimization with cost matrix cij=xiyj2c_{ij} = \|x_i - y_j\|^2 (Snow et al., 2018).
  • Monge–Ampère PDE approaches: For absolutely continuous marginals, reduced to convex potential gradient (Snow et al., 2018).
  • Sinkhorn–Knopp Algorithm: Entropic regularization for scalable and smooth approximations, accelerated in RW2RW_2 via closed-form barycenter updates (Wang et al., 2024).
  • RKHS Embedding: Kernelization of W2W_2 through feature space covariance and mean computation (Oh et al., 2019).
  • Gradient ICNNs: Learning convex potentials for high-dimensional maps with explicit invertibility guarantees (Korotin et al., 2019).
  • Quantum Gaussian Case: Covariance-based formula via symplectic invariants (Hertz et al., 19 Dec 2025).

Computational complexity varies but can be as high as O(n3)O(n^3) for exact Kantorovich LP and O(n2)O(n^2) with entropic regularization.

4.2 Empirical Convergence and Statistical Estimation

  • Normal approximation (CLT): Stein’s method and Zolotarev ideal metrics bound W2W_2 for locally dependent random variable sums; explicit rates in mm-dependence, U-statistics, and subgraph counts (Fang, 2018).
  • Moment/Cumulant Matching: Explicit bounds for approximating chaos/laws via combinatorial discrepancies (e.g., generalized Stein discrepancy in Wiener chaos) (Arras et al., 2016).
  • Sample-based Estimation: Consistency of empirical W2W_2 for manifold recovery in Wasserstein space (Hamm et al., 2023).

5. Extensions and Generalizations

5.1 Mixed Variable and Path-Space Metrics

Generalized W2W_2 metrics accommodate continuous and categorical random fields: y2=λi=1d1yi2+j=d1+1dδ^yj,0\|y\|^2 = \lambda \sum_{i=1}^{d_1} y_i^2 + \sum_{j=d_1+1}^d \hat{\delta}_{y_j,0} with empirical local Wasserstein structure for stochastic neural network training (Xia et al., 7 Jul 2025).

For stochastic processes, the W2W_2 metric extends to trajectory space with time-decoupled and time-coupled functionals, enabling effective SDE parameter recovery (Xia et al., 2024).

5.2 Quantum Wasserstein Distance

Quantum generalizations define W2W_2 via transport over density operators and quadratic quantum cost, preserving operational cost-minimization and closely relating to classical W2W_2 in the appropriate limit (Hertz et al., 19 Dec 2025).

5.3 Manifold Learning in Wasserstein Space

Intrinsic geometry on finite-dimensional submanifolds in P2(Ω)\mathcal{P}_2(\Omega) supports geodesic restrictions, tangent space estimation, and spectral learning (Hamm et al., 2023).

6. Applications

Application Domain Context/Description Reference
Image Comparison Pixelwise W2W_2 and PDE/LP-based transport for MNIST, yielding higher classification accuracy than Euclidean or affine-invariant metrics (Snow et al., 2018)
Medical Imaging RKHS-kernelized W2W_2 for texture-based clustering of CT slices, outperforming classical OT (Oh et al., 2019) (Oh et al., 2019)
SDE/Dynamical Model Reconstruction Neural network fitting via W2W_2-driven loss for stochastic systems, outperforming baseline MMD and likelihood-based losses (Xia et al., 2024)
Domain Adaptation Input-convex neural network approximation of W2W_2 map for feature alignment (Korotin et al., 2019)
Quantum Information Quantum Gaussian W2W_2 for state discrimination and metrology (Hertz et al., 19 Dec 2025)
Empirical Law Approximation Local dependence CLT and chaos approximation in normal and second Wiener chaos (Fang, 2018, Arras et al., 2016)
Manifold and Graph Data Submanifold recovery and Gromov–Wasserstein consistency using sampled W2W_2 distances (Hamm et al., 2023)

7. Ongoing Research and Open Problems

  • Efficiency and Scalability: Further optimization of entropic or approximate W2W_2 solvers for large-scale/high-dimensional data remains active (Wang et al., 2024).
  • Metric Extensions: New variants such as RW2RW_2 capture shift-invariant similarity and decompose bias/variance effects in distribution shift contexts (Wang et al., 2024).
  • Higher-Order Wasserstein Metrics: Conjectured explicit Berry–Esseen-type bounds for WpW_p (p>2p>2) under local dependence structures (Fang, 2018).
  • Universal Approximation in Learning: Demonstrated in the context of generalized W2W_2, showing end-to-end universal approximation by stochastic neural networks for arbitrarily complex random fields (Xia et al., 7 Jul 2025).
  • Quantum and Noncommutative Generalizations: Understanding the full scope of quantum W2W_2 for multimode/non-Gaussian states and its comparison with fidelity, trace distance, or Bures distance (Hertz et al., 19 Dec 2025).
  • Spectral and Manifold Methods: Tangent space extraction and learning with W2W_2 in infinite-dimensional measure spaces (Hamm et al., 2023).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Wasserstein-2 Distance.