Low-Rank Structure in Optimal Transport

Updated 15 October 2025

Low-rank structure in optimal transport is defined as imposing low-dimensional constraints on couplings or cost matrices, which enhances computational tractability and interpretability.
Algorithmic strategies such as nuclear norm regularization and alternating minimization enable efficient recovery of low-rank approximations in high-dimensional transport problems.
These low-rank methods improve scalability and statistical robustness, offering practical benefits in applications like genomics, imaging, and deep learning architectures.

Low-rank structure in optimal transport refers to the identification, imposition, or recovery of couplings, cost functions, or related objects that are of low (matrix or tensor) rank within the optimal transport (OT) formalism. This structural regularity enables significant gains in computational tractability, statistical robustness, and interpretability, and has led to a diverse array of theoretical frameworks, algorithmic methods, and application-driven innovations across classical two-marginal OT, multimarginal OT, and even in non-standard settings such as OT-based deep learning architectures.

1. Mathematical Formulations of Low-Rank Structure in OT

At the core, the classical OT problem between probability measures $\mu$ and $\nu$ on spaces $\mathcal{X}$ and $\mathcal{Y}$ with cost $c(x,y)$ seeks a coupling $\pi \in \Pi(\mu,\nu)$ minimizing $\int c(x,y)\,d\pi(x,y)$ . Low-rank structure can arise in several mathematically distinct locations:

Coupling matrix low-rankness: Imposing $\mathrm{rk}(P)\leq r$ for discrete couplings $P$ , leading to factorizations such as $P=QR^\top$ or more specifically

$P = Q\,\mathrm{diag}(1/g)\,R^\top, \quad Q\in\mathbb{R}_+^{n\times r},\, R\in\mathbb{R}_+^{m\times r}.$

This structure underpins approaches like Low-Rank Sinkhorn Factorization and LOT (Low-Rank Optimal Transport) (Scetbon et al., 2021, Scetbon et al., 2022).

Cost or affinity matrix low-rankness: In problems involving the estimation of bilinear affinity functions, as in $ϕ_A(x, y) = x^\top A y$ , low-rankness of $A$ is imposed via its SVD: $A = U S V^\top$ (Dupuy et al., 2016).
Transport rank / factored couplings: A coupling $\gamma$ is said to have transport rank $k$ if it decomposes as $\gamma = \sum_{j=1}^k \lambda_j (Q_j^0 \otimes Q_j^1)$ . This factorization can be interpreted as a sum over $k$ product couplings, inducing clustering structure and leading to practical and statistical improvements (Forrow et al., 2018).
Low-rank tensor structure in multimarginal settings: High-dimensional cost tensors $C \in (\mathbb{R}^n)^{\otimes k}$ are approximated as sums of rank-1 outer products plus a sparse component, or via network tensor contractions with low-rank surrogates for factors (Altschuler et al., 2020, Strössner et al., 2022).
Low-rank regularization: Regularization schemes such as nuclear norm (Schatten-$1$ norm), general Schatten- $p$ norm, or other convex surrogates directly penalize the rank or encourage low-dimensionality in the coupling or induced barycentric maps (Dupuy et al., 2016, Maunu, 13 Oct 2025).

2. Algorithms for Enforcing and Exploiting Low-Rank Structure

A spectrum of algorithmic strategies operationalize low-rank structure:

Convex relaxation via nuclear/Schatten norms: Imposing a nuclear norm penalty, as in $\min_A\ \mathcal{L}(A) + \lambda \|A\|_\ast$ , or more generally, Schatten- $p$ norm penalties, provides computationally tractable convex programs that serve as surrogates for direct rank minimization (Dupuy et al., 2016, Maunu, 13 Oct 2025). Optimal solutions are recovered via proximal gradient or mirror descent methods, often using explicit singular value thresholding.
Factored parameterizations and block-coordinate solvers: Mirror descent, Bregman (KL) projections, and Dykstra’s algorithm are used to update nonnegative factors $Q, R, g$ , enforcing marginal and inner consistency constraints. Advanced variants use latent coupling factorizations introducing auxiliary coupling matrices ( $T$ ) between cluster marginals, allowing for greater flexibility and decoupled updates (Scetbon et al., 2021, Halmos et al., 2024).
Alternating minimization for factored barycenters: For transport rank-regularized couplings, alternating minimization alternates between updating the transport plan (using, e.g., Sinkhorn iterations) and the hub locations (for barycenters), implementing the factored Wasserstein approach (Forrow et al., 2018).
Hierarchical/multiscale refinement for bijective mapping: The Hierarchical Refinement (HiRef) algorithm hierarchically partitions the problem into low-rank OT subproblems, recursively refining the granularity until a one-to-one bijective mapping is constructed, while only ever representing submatrices of low rank, yielding efficient $O(n\log n)$ computation for assignment problems (Halmos et al., 4 Mar 2025).
Low-rank tensor networks for multimarginal OT: In multi-marginal entropic OT, low-rank approximations (truncated SVD or Chebyshev interpolants) of factor tensors within graphical models make contraction and marginalization efficient, even as the full tensor complexity is intractable (Strössner et al., 2022).
Low-rank linear-time attention in deep models: LOTFormer introduces a pivot measure of small support, assembling the final $n\times n$ attention matrix via the “glued” product of two entropic OT problems of size $n\times r$ and $r\times n$ , enacting doubly-stochastic, low-rank attention in $O(nr)$ time (Shahbazi et al., 27 Sep 2025).

3. Theoretical Properties, Error Bounds, and Statistical Advantages

Low-rank structure provides not only algorithmic acceleration but also enhanced statistical properties and theoretical guarantees.

Approximation error bounds: For low-rank constrained couplings (LOT), the sub-optimality with respect to the true OT cost decays as $O(\log(n)/r)$ in discrete settings; in continuous and metric cases, rates linked to covering numbers and Wasserstein metric entropy are established (e.g., $O(r^{-p/(2d)})$ for various $p,d$ ) (Scetbon et al., 2022).
Sample complexity improvements: In factored OT, statistical estimation rates scale as $O(n^{-1/2})$ up to dimension-dependent logarithmic or polynomial terms, as opposed to the minimax $O(n^{-1/d})$ rate for vanilla plug-in estimators, thus circumventing the curse of dimensionality (Forrow et al., 2018).
Convexity and recovery guarantees: Nuclear/Schatten norm regularization ensures convexity when $p, q\ge 1$ , yielding tractable KKT conditions. In clustered or well-separated settings, block-diagonal or low-rank optimal plans are recovered exactly up to sharp thresholds on the regularization parameter $\lambda$ , providing guarantees for true low-rank recovery in both the coupling and barycentric projection (Maunu, 13 Oct 2025).
Non-asymptotic convergence: Mirror descent schemes for generic low-rank OT parameterizations achieve explicit $O(1/N)$ stationarity rates with precise measures of projected gradient norms, supported by analysis of the objective’s smoothness relative to negative entropy (Scetbon et al., 2021, Halmos et al., 2024).

4. Practical Implications: Scalability, Interpretability, and Applications

Low-rank structure is crucial for scaling OT to high-dimensional and large-scale datasets, as well as for enabling richer downstream insights:

Computational scaling: By confining the search space to low-rank couplings or factorizations, storage and runtime requirements drop from $O(n^2)$ or $O(n^3)$ to $O(nr)$ or $O(n\log n)$ , using $r$ -sized representations and matrix-vector products even for problems previously impossible to address with standard Sinkhorn iterations (Halmos et al., 4 Mar 2025, Scetbon et al., 2023).
Interpretability and clustering: Because low-rank or transport rank constraints induce a decomposition into few latent product components, the factors naturally cluster the mass in the coupling. This interpretable structure aligns with clustering tasks, as exemplified by the equivalence of rank- $k$ LOT minimization and $k$ -means in the squared Euclidean setting (Scetbon et al., 2022).
Reduced overfitting and regularization: Nuclear norm regularization and its variants avoid overfitting in high-dimensional settings with limited observations, effectively controlling covariance mismatches and suppressing estimation noise (Dupuy et al., 2016, Maunu, 13 Oct 2025).
Versatility across objectives and settings: Extensions to multimarginal OT and generalizations such as Gromov–Wasserstein and Fused GW can all benefit from latent coupling and block-coordinate mirror descent methods. Unbalanced OT problems incorporate mass penalization flexibly within the low-rank framework (Scetbon et al., 2023, Halmos et al., 2024).
Real-world applications: Recent advances have demonstrated substantial value in single-cell genomics (batch correction and alignment), large-scale graph alignment and clustering, long-context transformer attention, and color transfer in image processing (Forrow et al., 2018, Shahbazi et al., 27 Sep 2025, Halmos et al., 2024, Strössner et al., 2022).

5. Limitations and Hybrid/Future Directions

Despite significant progress, low-rank structure in OT faces subtleties and open problems:

Expressive limitations: Pure low-rank approximations may fail to faithfully capture full-rank (e.g., Monge map-induced bijective) couplings. Hybrid models that combine low-rank and sparse corrections (e.g., $T = L + S$ ) provide a more flexible basis for accurate approximation (Liu et al., 2021, Altschuler et al., 2020).
Approximation-refinement trade-offs: Hierarchical refinement schemes can achieve high-resolution mappings via recursive multiscale partitioning, bridging from coarse low-rank “coclusters” to fine bijections (Halmos et al., 4 Mar 2025).
Regularization parameter selection: Choosing regularization parameters (e.g., nuclear norm weight $\lambda$ , target rank $r$ , entropic strength $\epsilon$ ) remains an open issue, with ongoing work towards adaptive selection rules and debiasing strategies (Scetbon et al., 2022).
Theory–practice interface: While statistical rates and recovery bounds are established, practical lower bounds, adaptive ranks, and analysis for generic, non-metric costs remain open areas (Maunu, 13 Oct 2025, Scetbon et al., 2022).

6. Connections to Broader Computational and Statistical Paradigms

Low-rank OT techniques create points of intersection with the broader literature on matrix/tensor factorization, manifold learning, scalable kernel machines, and high-dimensional statistical inference:

Robust PCA and matrix recovery: OT plans decomposed as low-rank plus sparse directly mirror robust PCA models, extending interpretability and recovery theory to transportation problems (Liu et al., 2021).
Low-dimensional embeddings: Projects that approximate OT distances via low-rank or low-dimensional projections (linear or nonlinear, e.g., via neural networks enforcing 1-Lipschitzity) further expand the toolkit for high-dimensional data analysis (Fulop et al., 2021).
Generative modeling and deep learning: Low-rank OT formulations (e.g., in transformer attention and batch correction) have been successfully embedded into machine learning pipelines for more efficient and balanced architectures (Shahbazi et al., 27 Sep 2025, Scetbon et al., 2023).
Tensor networks and graphical models: Hierarchical and low-rank tensor approximations in multi-marginal OT leverage advances in tensor network contraction, graphical modeling, and quantum computing methods (Strössner et al., 2022, Altschuler et al., 2020).

Low-rank structure in optimal transport now underpins a unified set of algorithmic, statistical, and modeling innovations. Through convex relaxations, factored parameterizations, tensor approximations, and hybrid matrix models, the field has achieved efficient, interpretable, and robust solutions to both classical and emerging OT problems in high-dimensions and at previously unattainable scales. Theoretical advances and practical applications continue to motivate further exploration of latent and hierarchical low-rank structure and its integration into broader data science and scientific computing paradigms.