Optimal Transport (OT)
- Optimal Transport is a mathematical theory that structures probability measures into a metric space using cost-based couplings and convex optimization.
- It leverages Kantorovich's formulation and duality principles to establish existence, uniqueness, and practical computational methods like Sinkhorn’s algorithm.
- Wasserstein distances, derived from OT, quantify measures' convergence and underpin applications in image processing, machine learning, and statistical analysis.
Optimal Transport (OT) is a mathematical theory that equips the space of probability measures with a geometry induced by an underlying cost structure, providing a convex variational framework for optimally reallocating mass between probability distributions. In its modern Kantorovich form, OT underpins a vast array of developments in analysis, geometry, statistics, and machine learning, enabling the comparison and manipulation of distributions via optimal couplings. The theory encompasses foundational results such as metric structure, duality, existence and uniqueness (notably Brenier’s theorem), statistical rates, entropic regularization, and computational schemes such as Sinkhorn’s algorithm, with significant implications for both theoretical and applied disciplines (Chewi et al., 2024).
1. Monge and Kantorovich Formulations
Classically, Monge’s problem seeks a transport map pushing a source measure to a target measure by minimizing a cost functional: with a lower-semicontinuous cost (e.g. ). However, this formulation is nonconvex and may fail to admit solutions, particularly when or lacks absolute continuity (Chewi et al., 2024).
Kantorovich’s relaxation replaces the deterministic map with a probabilistic coupling : and considers the convex minimization: This convex program always admits solutions under mild conditions and reduces to the Monge formulation under additional regularity assumptions (e.g., absolute continuity and quadratic cost) via results like Brenier’s theorem (Chewi et al., 2024).
In the discrete case, couplings correspond to nonnegative matrices with prescribed row and column sums, transforming the OT problem into a classical linear program, whose solutions are concentrated on sparse permutations in the case of uniform marginals.
2. Duality and Structure of Optimal Couplings
OT admits a rich dual structure. The Kantorovich dual problem is
For continuous costs on compact spaces, strong duality holds, optimal potentials exist, and the complementary slackness condition holds -almost everywhere. In the quadratic case, this recovers convex duality and subdifferential calculus central to the geometric structure of transport (Chewi et al., 2024).
If the source is absolutely continuous, Brenier’s theorem guarantees the existence and uniqueness (up to sets of measure zero) of an optimal map for , with a convex potential and the coupling (Chewi et al., 2024). In one dimension, the optimal plan is given in closed form via the increasing rearrangement, and the Wasserstein distance admits an explicit quantile representation.
3. Wasserstein Distances: Metric Structure and Properties
For , the -Wasserstein distance on is defined as
providing a true metric that metrizes weak convergence (plus moment convergence) of measures. Fundamental properties include monotonicity in , boundedness by total variation on bounded sets, and explicit quantile formulas in dimension one (Chewi et al., 2024).
The Wasserstein distance underpins geometric analysis and characterizes convergence of empirical measures at quantifiable rates. When , the minimax rate for is , a manifestation of the curse of dimensionality (Chewi et al., 2024). For smoother underlying measures