Optimal Adaptive Transport (OAT)

Updated 5 February 2026

Optimal Adaptive Transport is a family of methodologies that generalizes classical OT by incorporating adaptive mass constraints, regularization, and feature alignment for tailored data solutions.
It employs a range of formulations—from adaptive mass and local regularization to acceleration-based flows—to enhance robustness in applications such as machine learning and sparse graph construction.
Practical insights include improved domain adaptation, multimodal alignment, and smoother generative modeling trajectories, supported by strong theoretical guarantees on convergence and uniqueness.

Optimal Adaptive Transport (OAT) is a family of methodologies that extend optimal transport (OT) theory to achieve data-driven and context-sensitive adaptivity, either by dynamically controlling mass transfer constraints, leveraging regularization on a local scale, or integrating structural information into the transport process. OAT spans a wide spectrum of formulations—adaptive mass constraints, adaptive regularization, dynamically learned feature alignments, and “lifted” second-order (acceleration-based) transports. These generalizations of classical OT have seen rapid uptake in machine learning, scientific computing, graph algorithms, decision-focused optimization, dynamical systems, and generative modeling.

1. Theoretical Foundations and Core Variants

Optimal Adaptive Transport encompasses several mathematically distinct but conceptually related extensions of classical OT:

Adaptive Mass Constraints: Instead of enforcing strict equality of source and target marginals (i.e., full mass transportation), adaptive OT allows mass to be partially matched, with marginal inequalities

$\pi^X_\#\gamma\le\mu,\quad \pi^Z_\#\gamma\le\nu$

and the optimizer determines the transported mass and active support adaptively (Yang et al., 7 Mar 2025).

Adaptive Regularization: Instead of a global entropic or quadratic regularization term, adaptive methods (e.g., OTARI) substitute per-row (and/or per-column) constraints, such as enforcing a minimum entropy (perplexity) per source or target point:

$\psi(P_{i:}) \leq \psi(e_{\xi})$

for all $i$ , where $e_{\xi}$ is the maximal allowed regularization pattern (Assel et al., 2023).

Adaptive Feature Alignment: In multi-modal alignment and recommendation, OAT is used to learn feature-level transport plans that adapt distributional supports and semantic representations between modalities, often by integrating learnable residuals on top of structured (e.g., Sinkhorn) plans (Li et al., 31 Jan 2026).
Second-order (Acceleration-based) OAT: In generative modeling, OAT generalizes OT from minimizing kinetic energy (velocity squared) to minimizing action (acceleration squared) over trajectories in state-velocity phase-space, producing smoother, “straighter” flows (Yue et al., 29 Sep 2025).
Adaptive Graph and Neighborhood Construction: OAT is used here to build sparse, data-adaptive graphs by combining linear cost with quadratic regularization, yielding a soft-thresholded transport plan whose sparsity and local adaptivity are controlled by a single parameter (Matsumoto et al., 2022).
Causal (Adapted) OT: In the time series or stochastic process setting, OAT refers to coupling distributions subject to temporal causality constraints, with specialized variants of Sinkhorn for entropically-regularized causal constraints (Eckstein et al., 2022).

2. Mathematical Formulations

Adaptive Mass OT

The adaptive-mass OT problem seeks

$\min_{\gamma\in\Gamma_{\le}(\mu,\nu)} \int_{X\times Z} c(x,z) \,d\gamma(x,z)$

where $\Gamma_{\le}(\mu,\nu)$ is the set of couplings with $\pi^X_\#\gamma\le\mu$ , $\pi^Z_\#\gamma\le\nu$ . Unlike classical (full-mass) OT, the total transported mass is decided by the optimizer, and transport plans are supported only on “active regions” where $c(x,z)<0$ or favored by the data (Yang et al., 7 Mar 2025).

Dual formulation (with $c$ possibly mixed-sign):

$\max_{\phi\le0} \int_X\phi(x)d\mu(x) + \int_Z\left[\inf_{x\in X}c(x,z)-\phi(x)\right]_+ d\nu(z)$

where $[\cdot]_+$ enforces the sign constraint on dual variables.

Adaptive Regularization (OTARI)

Let $C$ be the cost matrix, $P$ the transport plan, and $\psi$ a strictly convex regularizer (e.g., negative entropy or squared $\ell_2$ norm). OTARI imposes per-point (row and/or column) constraints on $\psi$ :

$(\mathrm{OTARI-s}) \quad \min_{P\in\Pi(a,b)} \langle P,C\rangle \qquad \text{s.t.} \quad \psi(P_{i:})\le\psi(e_\xi) \;\forall i$

The dual involves explicit Lagrange multipliers for marginal and per-point regularization constraints. The optimal $P^*$ is expressed as a generalized SoftMax (Assel et al., 2023).

Quadratic Regularization for Sparse Graphs

The OAT regularized transport problem:

$\min_{T\geq 0} \langle T, C \rangle + \frac{\alpha}{2} \|T\|_{F}^2 \quad \text{s.t.} \; T 1 = p, \; T^\top 1 = q$

yields a transport plan

$T^*_{ij} = \max\left\{0, \frac{u_i + v_j - C_{ij}}{\alpha}\right\}$

with $u,v$ found from dual optimization. This produces a sparse and data-adaptive neighborhood structure with the parameter $\alpha$ controlling local connectivity (Matsumoto et al., 2022).

Acceleration-based OAT (Generative Modeling)

The OAT action between distributions $\mu_0$ and $\mu_1$ is

$A_2^2(\mu_0,\mu_1) = \min_{\mu_t, a} \int_0^1 \int_{X\times V} \frac{1}{2} \mu_t(x,v)\|a(x,v,t)\|^2 \,dv\,dx\,dt$

subject to a Vlasov (second-order continuity) equation. The cost structure induces trajectory straightness and minimal acceleration (Yue et al., 29 Sep 2025).

3. Algorithms and Computational Strategies

Method/Class	Regularization/Constraint	Iterative Procedure
AOT (adaptive mass)	Marginal inequalities	Sinkhorn-style with clipping
OTARI	Per-point regularization (row/column)	Bregman projection/Dykstra
Quadratic OAT (graphs)	Frobenius norm penalty	Damped Newton/dual optimization
Multimodal/RecGOAT OAT	Entropic + learnable residual	Sinkhorn + supervised fine-tuning
Acceleration OAT (OAT-FM)	Acceleration cost in minibatches	Bi-level OT + acceleration loss
Adapted (causal) OT	Temporal causal constraints	Alternating Sinkhorn/Schrödinger

Adaptive Sinkhorn Algorithms: In the adaptive mass setting, updates generalize classical Sinkhorn scaling; at each step, scaling vectors are clipped to avoid exceeding marginal upper bounds, and this allows the mass to be optimally and adaptively allocated (Yang et al., 7 Mar 2025). For OTARI, Bregman projections are alternated over marginal and regularization constraints, allowing global convergence with per-point smoothing guarantees (Assel et al., 2023).

Quadratic Regularization & Sparse Graphs: Dual approaches using damped Newton or coordinate descent efficiently solve the quadratic OT problem and scale to high dimensions, building locally adaptive graphs without per-point parameter tuning (Matsumoto et al., 2022).

Acceleration-based OAT: OAT-FM takes a pre-trained flow-matching model and minimizes a loss reflecting per-pair acceleration between matched sample pairs, with mini-batch OT couplings determining pairings at each iteration (Yue et al., 29 Sep 2025).

4. Theoretical Guarantees and Interpretations

Uniqueness and Duality: For convex regularization (entropic or quadratic), OAT/OTARI admits strong duality and uniqueness results; per-point constraints guarantee a minimum spread of mass for every datapoint—removing the collapse-to-delta property of global OT (Assel et al., 2023).
Active Region Selection: By relaxing mass constraints and using mixed-sign costs, OAT achieves automatic feature selection, transporting only a subset of the mass and filtering out noise/outliers (Yang et al., 7 Mar 2025).
Alignment Consistency & Fusion Guarantees: In multimodal settings, upper bounds relating Wasserstein distance to prediction error connect the sharpness of OAT-induced alignment to downstream unified model accuracy (Li et al., 31 Jan 2026).
Straightness and Action Minimization: In OAT-FM, minimizing acceleration cost produces trajectories that are cubic interpolants—ensuring necessary and sufficient straightness of flows and a unique minimum-action path for each pair (Yue et al., 29 Sep 2025).

5. Applications and Empirical Results

Domain Adaptation: OAT improves the stability and effectiveness of source–target alignments. Adaptive mass allocation allows it to discard mismatched classes and reduce sensitivity to outliers. Empirical improvements of 2–6 percentage points in target accuracy on benchmarks like VisDA and Office-Home are found relative to fixed-mass OT and MMD (Yang et al., 7 Mar 2025, Assel et al., 2023).

Sparse Graph Construction: Adaptive quadratic OT yields neighborhood graphs that remain robust under variable sampling density and noise, outperforming fixed- $k$ NN in manifold learning, spectral clustering, and SSL. In high-dimensional RNA-seq and image data, OAT-graphs yield better recovery of ground-truth structure (Matsumoto et al., 2022).

Multimodal Representation Alignment: In RecGOAT, OAT is applied to align distributional embeddings across LLM-derived and collaborative-filtering modalities using learnable transport plans; adaptive alignment significantly narrows the gap between unimodal and fused multimodal prediction errors on large recommendation benchmarks (Li et al., 31 Jan 2026).

Generative Modeling/Flow Matching: OAT-FM sharpens and “straightens” generative trajectories. Empirical improvements include a consistent reduction in FID by 0.1–0.2 on datasets like CIFAR-10 and further improvements on large-scale ImageNet generation (Yue et al., 29 Sep 2025).

Robust Learning/Decision Optimization: In decision-focused few-shot learning, OAT is used to design class-adaptive priors for robust Sinkhorn-DRO, improving worst-case and average accuracy under distribution shift, with provable contraction and consistency guarantees (Sun et al., 1 Feb 2026).

6. Connections to Mesh Adaptivity and Manifold Methods

OAT-based mesh adaptation is formulated as the solution of a Monge–Ampère PDE whose solution yields a mesh mapping that equidistributes a user-chosen scalar monitor function. The induced Jacobian matrix encodes principal axes (eigenvectors) and anisotropy (eigenvalues) of mesh elements; the mesh is “M-uniform” with respect to the implicit metric tensor arising from the OT map (Budd et al., 2014). On compact manifolds, OAT and its analogue, Optimal Information Transport (OIT), provide frameworks for diffeomorphic mesh redistribution, with Poisson-based OIT demonstrating superior robustness on non-smooth targets and manifold domains (Turnquist, 2021).

7. Extensions, Limitations, and Outlook

Faster Solvers and Scalability: For adaptive mass and OTARI, each Sinkhorn-style or Bregman-projection iteration is $O(n^2)$ , so further research into low-rank and structure-exploiting approximations is needed for very large problems (Yang et al., 7 Mar 2025, Assel et al., 2023).
Joint Learning of Cost and Transport: Integrating adaptive OT layers with deep learning models, learning $c_\theta(x,z)$ end-to-end, remains a frontier for applications that demand simultaneous adaptation of features and mass transfer (Yang et al., 7 Mar 2025).
Robustness, Statistical Theory, and Theoretical Limits: Although OAT provides statistical and empirical robustness, the convergence rates of empirical OAT estimators remain an open problem, particularly in high-dimensional and partial (adaptive mass) regimes (Yang et al., 7 Mar 2025, Matsumoto et al., 2022).
Causality and Temporal Structure: OAT is also being extended to the causal/adapted setting for time series and dynamical systems, where adapted Sinkhorn algorithms maintain tractability while imposing temporal constraints (Eckstein et al., 2022).
Action-based Interpolations and Second-order Flows: Acceleration-based OAT connects closely to optimal control and physics, opening new directions for interpolation and flow matching in kinetic and generative modeling (Yue et al., 29 Sep 2025).

In summary, OAT generalizes and advances the OT paradigm by embedding problem-adaptive mechanisms into the transport process—be they local regularization, mass allocation, metric learning, or higher-order trajectory control. This versatility enables enhanced robustness, local adaptivity, and superior empirical performance across a wide variety of modern mathematical, algorithmic, and learning tasks.