Entropically Regularized Optimal Transport (EOT)

Updated 9 February 2026

Entropically Regularized Optimal Transport (EOT) is a method that introduces an entropy term to the classical optimal transport problem, ensuring smoothness and efficient computation via the Sinkhorn algorithm.
The Sinkhorn algorithm, along with its variants like Greenkhorn, iteratively updates scaling factors to enforce marginal constraints with linear convergence and reduced per-iteration cost.
EOT provides rigorous statistical tools including central limit theorems and bootstrap methods, which guarantee precise inference and bias-variance control in large-scale data analysis.

Entropically Regularized Optimal Transport (EOT) is a computational and statistical framework for optimal transport (OT) between probability distributions in which the Kantorovich objective is modified by an additional entropy (or relative entropy) term. This regularization yields a strictly convex cost that facilitates efficient computation via the Sinkhorn algorithm and provides strong smoothness and differentiability properties. EOT and its associated Sinkhorn divergences/metrics have become fundamental tools in large-scale data analysis, machine learning, and statistics due to their algorithmic tractability and well-understood inferential properties.

1. Mathematical Formulation: Primal, Dual, and Centered Loss

On a finite metric space $X = \{x_1, \ldots, x_N\}$ with cost matrix $C \in \mathbb{R}_+^{N \times N}$ specified by $c_{ij} = d(x_i, x_j)^p$ , the EOT cost between probabilistic weights $a, b \in \Sigma_N := \{a_i \geq 0, \sum_i a_i = 1\}$ is

$W_{p,\varepsilon}^p(a, b) := \min_{T\in U(a,b)} \langle T, C \rangle + \varepsilon H(T \,|\, a\otimes b)$

where $U(a, b)$ is the transport polytope of couplings with prescribed marginals and $H(T \,|\, a\otimes b) = \sum_{i,j} t_{ij} \log\bigl(\frac{t_{ij}}{a_i b_j}\bigr)$ is the relative entropy.

The Fenchel dual is

$W_{p, \varepsilon}^p(a, b) = \max_{u, v \in \mathbb{R}^N} u^\top a + v^\top b - \varepsilon \sum_{i,j} \exp\Big(-\frac{c_{ij} - u_i - v_j}{\varepsilon}\Big) a_i b_j$

Uniqueness of optimizers holds up to additive constants.

The EOT cost $W_{p,\varepsilon}^p$ is not a true metric since $W_{p,\varepsilon}^p(a, a) > 0$ in general. The centered Sinkhorn loss is defined as

$S_{p, \varepsilon}(a,b) := W_{p, \varepsilon}^p(a, b) - \frac{1}{2}\Big(W_{p,\varepsilon}^p(a, a) + W_{p,\varepsilon}^p(b, b)\Big)$

which is non-negative and vanishes if and only if $a = b$ . As $\varepsilon \to 0$ , $S_{p,\varepsilon}(a,b)$ converges to the (unregularized) $p$ -Wasserstein distance $W_p^p(a,b)$ (Bigot et al., 2017).

2. Algorithmic Computation: Sinkhorn and Greedy Methods

Given $K = \exp(-C/\varepsilon)$ , the unique optimal coupling has the factorized form

$T^* = \mathrm{Diag}(u) K \mathrm{Diag}(v)$

where $u, v \in \mathbb{R}_+^N$ are scaling vectors. The scaling vectors solve the marginal constraints by alternating updates: $u^{(\ell+1)} = a\,./\,(Kv^{(\ell)}), \quad v^{(\ell+1)} = b\,./\,(K^\top u^{(\ell+1)})$ where $\,./\,$ denotes componentwise division. This Sinkhorn algorithm converges geometrically (linearly) under positivity of $K$ , and $\tilde{O}(N^2)$ arithmetic per iteration is required; $50-200$ iterations is typical for moderate $N$ .

Variants such as Greenkhorn and Greedy Stochastic Sinkhorn update only the most violated row or column at each step, reducing per-iteration cost and in favorable regimes outperform standard Sinkhorn in wall-clock time (Abid et al., 2018).

3. Statistical Properties: Central Limit Theorems and Bootstrap

The map $(a, b)\mapsto W^p_{p,\varepsilon}(a,b)$ is Fréchet-differentiable on $\Sigma_N \times \Sigma_N$ , with derivative

$\nabla W^p_{p,\varepsilon}(a,b)(h_1,h_2) = \langle u_\varepsilon, h_1\rangle + \langle v_\varepsilon, h_2\rangle$

where $(u_\varepsilon, v_\varepsilon)$ are any dual optimizers.

Let $\hat a_n$ and $\hat b_m$ be empirical measures from $n$ and $m$ i.i.d. samples. For $m/(n+m)\to \gamma \in (0,1)$ and multinomial covariances $\Sigma(a), \Sigma(b)$ , asymptotic normality holds:

For the (non-centered) Sinkhorn divergence,

$\sqrt{n}\Big(W_{p,\varepsilon}^p(\hat a_n, b)-W_{p,\varepsilon}^p(a,b)\Big) \to_d \langle G, u_\varepsilon\rangle$

and for two-sample,

$\rho_{n,m}\Big(W_{p,\varepsilon}^p(\hat a_n, \hat b_m) - W_{p,\varepsilon}^p(a,b)\Big) \to_d \sqrt{\gamma}\langle G,u_\varepsilon\rangle + \sqrt{1-\gamma}\langle H, v_\varepsilon\rangle$

with $G, H$ : independent Gaussian limits.

For the centered Sinkhorn loss $S_{p,\varepsilon}$ , similar central limit theorems describe both the alternative ( $a\neq b$ ) and null ( $a=b$ ) regimes. Under $a=b$ , the limit is non-Gaussian and mixes weighted chi-square variables determined by the Hessian of $S_{p,\varepsilon}$ at $(a,a)$ (Bigot et al., 2017).

These results extend the statistical theory of OT to the regularized regime, and the limit laws are essential for valid hypothesis tests and confidence intervals.

Bootstrap procedures enable practical inference:

Under $a \neq b$ , the law of $\sqrt{n}(W_{p,\varepsilon}^p(\hat a_n^*, b) - W_{p,\varepsilon}^p(\hat a_n, b))$ (with bootstrap resampling) converges to the correct asymptotic distribution.
Under $a=b$ , standard bootstrap fails due to first-order degeneracy; a second-order correction (Babu correction) recovers consistency.

4. Limit Behavior: The $\varepsilon \to 0$ Asymptotics and Recovery of OT

As the regularization parameter vanishes, EOT recovers the classical Kantorovich OT problem. With the mild growth condition $\sqrt{n}\,\varepsilon_n \log(1/\varepsilon_n) \to 0$ , the central limit theorem for EOT converges to the classical OT central limit law, and regularized dual optimizers converge to (possibly non-unique) Kantorovich duals (Bigot et al., 2017).

This formalizes the exact sense in which EOT interpolates between maximum-entropy (fully regularized) couplings and the singular, possibly non-unique unregularized OT solutions, providing a controlled, smooth approximation valid in both computational and statistical limits.

5. Practical Applications and Empirical Behavior

Empirical studies confirm theory in both synthetic and real-data regimes:

In $L\times L$ discrete grids in $\mathbb{R}^2$ (e.g., $L=5,10,20$ ), the empirical distribution of $\sqrt{n}(S_{p,\varepsilon}(\hat a_n, b)-S_{p,\varepsilon}(a, b))$ matches the CLT prediction even for moderate $n\approx 10^3-10^4$ .
For the test $H_0:a=b$ in color histograms of autumn vs.\ winter image sets (3D histograms, $16^3$ grid), two-sample bootstrap tests with $\varepsilon=10,100$ yielded rejection well beyond the $95\%$ bootstrap band, detecting subtle discrepancies that a classical $\chi^2$ test missed.
Power analysis for one-sample tests (uniform $a$ vs.\ $b$ with a linear trend) shows rapid increase in rejection rate as the signal departs from the null.
Relative-entropy and plain-entropy forms of regularization yield comparable discriminative performance.

The Sinkhorn divergence and its centered variant thus provide computationally practical and statistically effective tools for measuring discrepancies, performing clustering, hypothesis testing, and other inferential tasks on high-dimensional discrete distributions (Bigot et al., 2017).

6. Key Theoretical and Methodological Insights

The combination of the following structural properties makes EOT particularly attractive:

Efficient computation: $O(N^2)$ per Sinkhorn iteration with linear convergence and practical scalability to large discrete domains.
Differentiability: Fréchet-differentiable everywhere, enabling optimization and gradient-based learning.
Rigorous inference: Established CLTs and bootstrap for both the uncentered and centered losses, covering both the alternative and null regimes.
Bias-variance control: The regularization parameter $\varepsilon$ explicitly tunes the bias–variance trade-off between approximation to OT and numerical/statistical stability.
Limit compatibility: As $\varepsilon \to 0$ , all smooth and statistical properties recover those of the original, unregularized OT.

These features ensure that entropy-regularized OT and its Sinkhorn divergence yield a full analytic toolkit for statistical inference, learning, and testing on discrete distributions in high-dimensional spaces, with robust guarantees and practical performance (Bigot et al., 2017).

Markdown Report Issue Upgrade to Chat

References (2)

Central limit theorems for entropy-regularized optimal transport on finite spaces and statistical applications (2017)

Greedy stochastic algorithms for entropy-regularized optimal transport problems (2018)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Entropically Regularized Optimal Transport (EOT).