Papers
Topics
Authors
Recent
Search
2000 character limit reached

Optimal Transport (OT)

Updated 26 January 2026
  • Optimal Transport is a mathematical theory that structures probability measures into a metric space using cost-based couplings and convex optimization.
  • It leverages Kantorovich's formulation and duality principles to establish existence, uniqueness, and practical computational methods like Sinkhorn’s algorithm.
  • Wasserstein distances, derived from OT, quantify measures' convergence and underpin applications in image processing, machine learning, and statistical analysis.

Optimal Transport (OT) is a mathematical theory that equips the space of probability measures with a geometry induced by an underlying cost structure, providing a convex variational framework for optimally reallocating mass between probability distributions. In its modern Kantorovich form, OT underpins a vast array of developments in analysis, geometry, statistics, and machine learning, enabling the comparison and manipulation of distributions via optimal couplings. The theory encompasses foundational results such as metric structure, duality, existence and uniqueness (notably Brenier’s theorem), statistical rates, entropic regularization, and computational schemes such as Sinkhorn’s algorithm, with significant implications for both theoretical and applied disciplines (Chewi et al., 2024).

1. Monge and Kantorovich Formulations

Classically, Monge’s problem seeks a transport map T:XXT:X\to X pushing a source measure μ\mu to a target measure ν\nu by minimizing a cost functional: infT:T#μ=νXc(x,T(x))μ(dx),\inf_{T:T_\#\mu=\nu} \int_X c(x,T(x))\,\mu(dx), with c:X×X[0,)c:X\times X\to[0,\infty) a lower-semicontinuous cost (e.g. c(x,y)=xypc(x,y)=\|x-y\|^p). However, this formulation is nonconvex and may fail to admit solutions, particularly when μ\mu or ν\nu lacks absolute continuity (Chewi et al., 2024).

Kantorovich’s relaxation replaces the deterministic map with a probabilistic coupling γΠ(μ,ν)\gamma\in\Pi(\mu,\nu): Π(μ,ν)={γP(X×X):γ(×X)=μ,γ(X×)=ν},\Pi(\mu,\nu) = \bigl\{\gamma\in\mathcal P(X\times X): \gamma(\cdot\times X)=\mu,\, \gamma(X\times\cdot)=\nu \bigr\}, and considers the convex minimization: minγΠ(μ,ν)X×Xc(x,y)γ(dx,dy).\min_{\gamma\in\Pi(\mu,\nu)}\int_{X\times X} c(x,y)\,\gamma(dx,dy). This convex program always admits solutions under mild conditions and reduces to the Monge formulation under additional regularity assumptions (e.g., absolute continuity and quadratic cost) via results like Brenier’s theorem (Chewi et al., 2024).

In the discrete case, couplings correspond to nonnegative matrices PP with prescribed row and column sums, transforming the OT problem into a classical linear program, whose solutions are concentrated on sparse permutations in the case of uniform marginals.

2. Duality and Structure of Optimal Couplings

OT admits a rich dual structure. The Kantorovich dual problem is

infγΠ(μ,ν)cdγsupf,g{fdμ+gdν:f(x)+g(y)c(x,y)}.\inf_{\gamma\in\Pi(\mu,\nu)} \int c\,d\gamma \ge \sup_{f,g}\left\{ \int f\,d\mu+\int g\,d\nu : f(x)+g(y)\le c(x,y) \right\}.

For continuous costs on compact spaces, strong duality holds, optimal potentials (f,g)(f^*,g^*) exist, and the complementary slackness condition f(x)+g(y)=c(x,y)f^*(x)+g^*(y)=c(x,y) holds γ\gamma^*-almost everywhere. In the quadratic case, this recovers convex duality and subdifferential calculus central to the geometric structure of transport (Chewi et al., 2024).

If the source is absolutely continuous, Brenier’s theorem guarantees the existence and uniqueness (up to sets of measure zero) of an optimal map T=φT=\nabla\varphi for c(x,y)=xy2c(x,y)=\|x-y\|^2, with φ\varphi a convex potential and the coupling γ=(id,T)#μ\gamma=(\text{id},T)_\#\mu (Chewi et al., 2024). In one dimension, the optimal plan is given in closed form via the increasing rearrangement, and the Wasserstein distance admits an explicit quantile representation.

3. Wasserstein Distances: Metric Structure and Properties

For p1p\ge1, the pp-Wasserstein distance on Pp(Rd)\mathcal P_p(\mathbb R^d) is defined as

Wp(μ,ν)=(infγΠ(μ,ν)xypdγ(x,y))1/p,W_p(\mu,\nu) = \left(\inf_{\gamma\in\Pi(\mu,\nu)} \int \|x-y\|^p\,d\gamma(x,y)\right)^{1/p},

providing a true metric that metrizes weak convergence (plus moment convergence) of measures. Fundamental properties include monotonicity in pp, boundedness by total variation on bounded sets, and explicit quantile formulas in dimension one (Chewi et al., 2024).

The Wasserstein distance underpins geometric analysis and characterizes convergence of empirical measures at quantifiable rates. When d>2pd>2p, the minimax rate for EWp(μn,μ)E W_p(\mu_n,\mu) is n1/dn^{-1/d}, a manifestation of the curse of dimensionality (Chewi et al., 2024). For smoother underlying measures

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Optimal Transport (OT).