Entropically Regularized Optimal Transport (EOT)
- Entropically Regularized Optimal Transport (EOT) is a method that introduces an entropy term to the classical optimal transport problem, ensuring smoothness and efficient computation via the Sinkhorn algorithm.
- The Sinkhorn algorithm, along with its variants like Greenkhorn, iteratively updates scaling factors to enforce marginal constraints with linear convergence and reduced per-iteration cost.
- EOT provides rigorous statistical tools including central limit theorems and bootstrap methods, which guarantee precise inference and bias-variance control in large-scale data analysis.
Entropically Regularized Optimal Transport (EOT) is a computational and statistical framework for optimal transport (OT) between probability distributions in which the Kantorovich objective is modified by an additional entropy (or relative entropy) term. This regularization yields a strictly convex cost that facilitates efficient computation via the Sinkhorn algorithm and provides strong smoothness and differentiability properties. EOT and its associated Sinkhorn divergences/metrics have become fundamental tools in large-scale data analysis, machine learning, and statistics due to their algorithmic tractability and well-understood inferential properties.
1. Mathematical Formulation: Primal, Dual, and Centered Loss
On a finite metric space with cost matrix specified by , the EOT cost between probabilistic weights is
where is the transport polytope of couplings with prescribed marginals and is the relative entropy.
The Fenchel dual is
Uniqueness of optimizers holds up to additive constants.
The EOT cost is not a true metric since in general. The centered Sinkhorn loss is defined as
which is non-negative and vanishes if and only if . As , converges to the (unregularized) -Wasserstein distance (Bigot et al., 2017).
2. Algorithmic Computation: Sinkhorn and Greedy Methods
Given , the unique optimal coupling has the factorized form
where are scaling vectors. The scaling vectors solve the marginal constraints by alternating updates: where denotes componentwise division. This Sinkhorn algorithm converges geometrically (linearly) under positivity of , and arithmetic per iteration is required; $50-200$ iterations is typical for moderate .
Variants such as Greenkhorn and Greedy Stochastic Sinkhorn update only the most violated row or column at each step, reducing per-iteration cost and in favorable regimes outperform standard Sinkhorn in wall-clock time (Abid et al., 2018).
3. Statistical Properties: Central Limit Theorems and Bootstrap
The map is Fréchet-differentiable on , with derivative
where are any dual optimizers.
Let and be empirical measures from and i.i.d. samples. For and multinomial covariances , asymptotic normality holds:
- For the (non-centered) Sinkhorn divergence,
and for two-sample,
with : independent Gaussian limits.
- For the centered Sinkhorn loss , similar central limit theorems describe both the alternative () and null () regimes. Under , the limit is non-Gaussian and mixes weighted chi-square variables determined by the Hessian of at (Bigot et al., 2017).
These results extend the statistical theory of OT to the regularized regime, and the limit laws are essential for valid hypothesis tests and confidence intervals.
Bootstrap procedures enable practical inference:
- Under , the law of (with bootstrap resampling) converges to the correct asymptotic distribution.
- Under , standard bootstrap fails due to first-order degeneracy; a second-order correction (Babu correction) recovers consistency.
4. Limit Behavior: The Asymptotics and Recovery of OT
As the regularization parameter vanishes, EOT recovers the classical Kantorovich OT problem. With the mild growth condition , the central limit theorem for EOT converges to the classical OT central limit law, and regularized dual optimizers converge to (possibly non-unique) Kantorovich duals (Bigot et al., 2017).
This formalizes the exact sense in which EOT interpolates between maximum-entropy (fully regularized) couplings and the singular, possibly non-unique unregularized OT solutions, providing a controlled, smooth approximation valid in both computational and statistical limits.
5. Practical Applications and Empirical Behavior
Empirical studies confirm theory in both synthetic and real-data regimes:
- In discrete grids in (e.g., ), the empirical distribution of matches the CLT prediction even for moderate .
- For the test in color histograms of autumn vs.\ winter image sets (3D histograms, grid), two-sample bootstrap tests with yielded rejection well beyond the bootstrap band, detecting subtle discrepancies that a classical test missed.
- Power analysis for one-sample tests (uniform vs.\ with a linear trend) shows rapid increase in rejection rate as the signal departs from the null.
- Relative-entropy and plain-entropy forms of regularization yield comparable discriminative performance.
The Sinkhorn divergence and its centered variant thus provide computationally practical and statistically effective tools for measuring discrepancies, performing clustering, hypothesis testing, and other inferential tasks on high-dimensional discrete distributions (Bigot et al., 2017).
6. Key Theoretical and Methodological Insights
The combination of the following structural properties makes EOT particularly attractive:
- Efficient computation: per Sinkhorn iteration with linear convergence and practical scalability to large discrete domains.
- Differentiability: Fréchet-differentiable everywhere, enabling optimization and gradient-based learning.
- Rigorous inference: Established CLTs and bootstrap for both the uncentered and centered losses, covering both the alternative and null regimes.
- Bias-variance control: The regularization parameter explicitly tunes the bias–variance trade-off between approximation to OT and numerical/statistical stability.
- Limit compatibility: As , all smooth and statistical properties recover those of the original, unregularized OT.
These features ensure that entropy-regularized OT and its Sinkhorn divergence yield a full analytic toolkit for statistical inference, learning, and testing on discrete distributions in high-dimensional spaces, with robust guarantees and practical performance (Bigot et al., 2017).