Optimal Transport (OT)

Updated 26 January 2026

Optimal Transport is a mathematical theory that structures probability measures into a metric space using cost-based couplings and convex optimization.
It leverages Kantorovich's formulation and duality principles to establish existence, uniqueness, and practical computational methods like Sinkhorn’s algorithm.
Wasserstein distances, derived from OT, quantify measures' convergence and underpin applications in image processing, machine learning, and statistical analysis.

Optimal Transport (OT) is a mathematical theory that equips the space of probability measures with a geometry induced by an underlying cost structure, providing a convex variational framework for optimally reallocating mass between probability distributions. In its modern Kantorovich form, OT underpins a vast array of developments in analysis, geometry, statistics, and machine learning, enabling the comparison and manipulation of distributions via optimal couplings. The theory encompasses foundational results such as metric structure, duality, existence and uniqueness (notably Brenier’s theorem), statistical rates, entropic regularization, and computational schemes such as Sinkhorn’s algorithm, with significant implications for both theoretical and applied disciplines (Chewi et al., 2024).

1. Monge and Kantorovich Formulations

Classically, Monge’s problem seeks a transport map $T:X\to X$ pushing a source measure $\mu$ to a target measure $\nu$ by minimizing a cost functional: $\inf_{T:T_\#\mu=\nu} \int_X c(x,T(x))\,\mu(dx),$ with $c:X\times X\to[0,\infty)$ a lower-semicontinuous cost (e.g. $c(x,y)=\|x-y\|^p$ ). However, this formulation is nonconvex and may fail to admit solutions, particularly when $\mu$ or $\nu$ lacks absolute continuity (Chewi et al., 2024).

Kantorovich’s relaxation replaces the deterministic map with a probabilistic coupling $\gamma\in\Pi(\mu,\nu)$ : $\Pi(\mu,\nu) = \bigl\{\gamma\in\mathcal P(X\times X): \gamma(\cdot\times X)=\mu,\, \gamma(X\times\cdot)=\nu \bigr\},$ and considers the convex minimization: $\mu$ 0 This convex program always admits solutions under mild conditions and reduces to the Monge formulation under additional regularity assumptions (e.g., absolute continuity and quadratic cost) via results like Brenier’s theorem (Chewi et al., 2024).

In the discrete case, couplings correspond to nonnegative matrices $\mu$ 1 with prescribed row and column sums, transforming the OT problem into a classical linear program, whose solutions are concentrated on sparse permutations in the case of uniform marginals.

2. Duality and Structure of Optimal Couplings

OT admits a rich dual structure. The Kantorovich dual problem is

$\mu$ 2

For continuous costs on compact spaces, strong duality holds, optimal potentials $\mu$ 3 exist, and the complementary slackness condition $\mu$ 4 holds $\mu$ 5-almost everywhere. In the quadratic case, this recovers convex duality and subdifferential calculus central to the geometric structure of transport (Chewi et al., 2024).

If the source is absolutely continuous, Brenier’s theorem guarantees the existence and uniqueness (up to sets of measure zero) of an optimal map $\mu$ 6 for $\mu$ 7, with $\mu$ 8 a convex potential and the coupling $\mu$ 9 (Chewi et al., 2024). In one dimension, the optimal plan is given in closed form via the increasing rearrangement, and the Wasserstein distance admits an explicit quantile representation.

3. Wasserstein Distances: Metric Structure and Properties

For $\nu$ 0, the $\nu$ 1-Wasserstein distance on $\nu$ 2 is defined as

$\nu$ 3

providing a true metric that metrizes weak convergence (plus moment convergence) of measures. Fundamental properties include monotonicity in $\nu$ 4, boundedness by total variation on bounded sets, and explicit quantile formulas in dimension one (Chewi et al., 2024).

The Wasserstein distance underpins geometric analysis and characterizes convergence of empirical measures at quantifiable rates. When $\nu$ 5, the minimax rate for $\nu$ 6 is $\nu$ 7, a manifestation of the curse of dimensionality (Chewi et al., 2024). For smoother underlying measures

Markdown Report Issue Upgrade to Chat

References (1)

Statistical optimal transport (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Optimal Transport (OT).