Optimal Transport Chaining

Updated 4 February 2026

Optimal Transport Chaining is a framework for sequentially aligning and computing transport mappings that integrate intermediate states, coupling, and regularity constraints.
It employs specialized algorithms such as ILP network flow, Sinkhorn iterations, and dynamic programming to efficiently solve complex multi-stage transport problems.
The approach drives applications from mobility planning and domain adaptation to topological anomaly detection, ensuring robust, interpretable, and computationally tractable solutions.

Optimal Transport Chaining is a unifying framework for modeling, computing, and aligning structural relationships among measures, plans, topological features, and stochastic processes via the sequential composition of optimal transport steps under various coupling, metric, or regularity constraints. It encodes the solution as a chain, often through discrete or parametric intermediate states, anchor points, or variants, and can be triggered either by time-window flexibility, conditional marginals, topological persistence, or causal restrictions. This approach generalizes classical optimal transport—where transport is only allowed from source to target—to settings where multiple sequential or layered mappings must be constructed, subject to complex feasibility or regularity criteria.

1. Mathematical Formulation of Chaining Problems

The canonical mathematical structures of optimal transport chaining can be organized as multi-stage optimization problems, each stage described by its marginal, conditional, or structural constraints. Examples include:

Plan chaining with time windows: Given a set of plans $P$ with time windows and delays $\theta_p$ , and vehicles $V$ , feasible connections are constructed among delayed plan-variants $p^\delta \in PV(p)$ such that arcs $(a \rightarrow b)$ (with $a, b \in V \cup P \cup PV$ ) respect travel-time and cost constraints, forming directed chains $v \rightarrow p^\delta \rightarrow q^{\delta'} \rightarrow \cdots$ . Decision variables $x_{ab} \in \{0,1\}$ encode chain membership, with constraints ensuring every original plan is covered exactly once in an admissible variant, no vehicle initiates more than one chain, and intermediate variants enforce flow conservation. The optimization is:

$\min \sum_{(a \rightarrow b) \in C} c_{ab} x_{ab},$

subject to coupling, integrality, and time-window feasibility constraints (Fiedler et al., 2024).

Chain Rule OT for statistical mixtures: For joint distributions $P(X,Y)$ and $Q(X,Y)$ , the chain-rule OT metric is

$W_{\rm chain}(P, Q) = \inf_{\pi \in \Pi(P_X, Q_X)} \int_{\mathcal X} \int_{\mathcal X} d(P_{Y|X}(\cdot|x), Q_{Y|X}(\cdot|x')) \, \pi(x, x') \, dx \, dx',$

where $d$ is a ground distance acting on the space of conditionals (Nielsen et al., 2018).

Latent OT via anchor chains: Transport is forced through a sequence of anchor measures $Z_x, Z_y$ with weights $u_z, v_z$ , structured via three coupling matrices $(P_x, P_z, P_y)$ connected by the rule $P = P_x \operatorname{diag}(u_z^{-1}) P_z \operatorname{diag}(v_z^{-1}) P_y$ . Marginals and anchor weights are updated via entropic projections and anchoring equations (Lin et al., 2020).
Topological chain alignment: Multi-filtration persistence diagrams are aligned sequentially using entropy-regularized OT, where stability scores for topological features are computed across thresholds and filtrations by chaining optimal couplings $\Pi^{(k,k+1)}$ (Zia et al., 28 Jan 2026).

2. Sequential Structure and Coupling Constraints

Chaining proceeds by introducing intermediate states or plan-variants (vehicle plans, anchor points, topological features, conditional distributions, Markov chain states), each equipped with feasibility, cost, or metric constraints that determine admissible transitions. Notable coupling structures include:

Time-window-based feasibility: Delayed plan-variants admit only transitions $(a^\varepsilon \rightarrow b^\delta)$ such that $e_{a^\varepsilon} + \tau_{ab} \leq s_{b^\delta}$ , leveraging the flexibility in plan delays up to $\theta_p$ (Fiedler et al., 2024).
Intermediate anchor factorization: Mass transport proceeds strictly through anchor nodes, creating low-rank couplings and robust alignments; the overall coupling $P$ is the matrix product of anchor-wise transports (Lin et al., 2020).
Conditional coupling for mixtures: The optimal coupling minimizes aggregated cost over conditional distributions, yielding metrics that unify OT on points, mixtures, and provide upper bounds for jointly convex divergences (Nielsen et al., 2018).
Topological feature chaining: Cross-threshold and cross-filtration couplings are computed via entropy-regularized OT on birth-death pairs of persistence diagrams. Stability scores select features that are chain-stable both within filtrations and across filtration types (Zia et al., 28 Jan 2026).

3. Algorithmic Solutions and Complexity

Chaining problems are solved by specialized algorithms tailored to their structural constraints:

Variant generation and network flow: For time-window chaining, a variant generation algorithm constructs only those delayed plan-variants necessary to realize all feasible arcs, followed by an ILP/min cost flow on a layered graph. This exploits polynomial time complexity in the absence of coupling constraints and practical tractability (seconds to minutes) for large instances (Fiedler et al., 2024).
Sinkhorn and multi-Sinkhorn iterations: Entropic regularization yields fast differentiable solvers for chain-rule OT and anchor-based coupling problems. In the mixture/chain rule setting, Sinkhorn-typed updates scale with the number of mixture components; for multi-anchor transport, block-coordinate multi-Sinkhorn iterations enforce the six marginal constraints (Nielsen et al., 2018, Lin et al., 2020).
Dynamic programming recursion: Bicausal OT chaining for Markov chains utilizes a Bellman-type recursive value iteration, solving linear programs per state-time pair, ensuring convergence via contraction properties (Moulos, 2020).
Sequential topological alignment: OT chaining for persistence diagrams involves repeated computation of entropy-regularized couplings between PDs at successive thresholds, aggregation of intra-filtration and cross-filtration stability scores, and filtering chains to retain only persistent, well-aligned topological features (Zia et al., 28 Jan 2026).

4. Theoretical Guarantees and Metric Properties

Optimal Transport Chaining inherits and extends several theoretical results from OT, mixture theory, and martingale transport:

Optimality and uniqueness: Time-window chaining achieves completeness and optimality by generating only variants needed for feasible connections, with correctness proven via minimal-delay sufficiency (Fiedler et al., 2024).
Metric properties: Chain-rule OT is a bona-fide metric whenever the conditional ground distance $d$ is a metric, satisfying nonnegativity, symmetry, and triangle inequality (Nielsen et al., 2018).
Upper bounds for divergences: Chaining distances provide upper bounds for jointly convex divergences such as $f$ -divergences and Wasserstein metrics, ensuring robustness in mixture learning and distribution simplification (Nielsen et al., 2018).
Stability guarantees: In topological OT chaining, the bottleneck stability of persistence diagrams and Lipschitz continuity of entropy-regularized OT couplings enforce robustness to perturbations and noise, while the chaining process provably discards unstable, noise-induced features (Zia et al., 28 Jan 2026).
Sample complexity and low-rank bias: Chain-based OT relaxations such as anchor-based LOT admit faster statistical convergence ( $\propto \sqrt{k^3 d \log k / N}$ ) compared to vanilla OT ( $N^{-1/d}$ ) and produce low-rank couplings interpretable via cluster membership (Lin et al., 2020).

5. Applications and Empirical Performance

Optimal Transport Chaining finds utility across diverse domains:

Mobility-on-Demand and fleet sizing: Embedding time-window chaining into dial-a-ride heuristics allows for the reduction of fleet size and driving distance, outperforming insertion and metaheuristic baselines in urban taxi datasets, with efficient runtimes and scalable ILP formulations (Fiedler et al., 2024).
Gaussian Mixture Model (GMM) learning: Chain-rule OT provides a differentiable framework for learning GMMs by simplifying kernel density estimators and optimizing the OT-based upper bound on KL divergence. Empirical evidence on MNIST and Fashion-MNIST shows improved KL scores and competitive computation time versus EM algorithms (Nielsen et al., 2018).
Domain adaptation and distribution alignment: Anchor-based chaining yields interpretable and robust solutions for dataset alignment, outperforming unregularized OT under noise, high dimension, and cluster mismatches. LOT achieves superior accuracy and stability in structured alignment tasks (Lin et al., 2020).
Topology-aware anomaly segmentation: OT chaining of persistence diagrams, assembled via geodesic-style stability scores, delivers state-of-the-art mean F1 improvements for both 2D and 3D anomaly detection tasks, replacing brittle thresholding with structure-aware pseudo-label supervision (Zia et al., 28 Jan 2026).

6. Connections to Classical and Weak Transport Theory

Optimal Transport Chaining is tightly connected to several foundational results:

Martingale and weak transport: The barycentric (weak) quadratic cost admits canonical chaining via deterministic maps followed by martingale couplings, as established by Gozlan–Juillet. The optimal chain decomposes as a Brenier map onto a convex-ordered intermediate, then a martingale transport plan, with explicit dual formulations and regularity properties (Gozlan et al., 2018).
Bicausal and faithful couplings: Chaining on Markov path-space, under bicausal constraints, provides a dynamic-programming (Bellman) structure, recovers static OT as a special case, and encompasses faithful couplings for Markov chains, minimizing expected meeting time or extending coupling-time minimization (Moulos, 2020).
Extensions of classical OT: Chaining models unify OT distances on points, finite mixtures, and weak transport, and admit additional metric, contractivity, or order-theoretic properties (e.g., convex order, Monge–Ampère regularity, Caffarelli contraction for log-concave ν) (Gozlan et al., 2018, Nielsen et al., 2018).

7. Design Principles, Interpretability, and Robustness

Central advantages of optimal transport chaining include structural interpretability, noise robustness, and computational tractability:

Interpretability: Chain-based approaches clarify the mapping between source and target via intermediate anchors, plan variants, or topological feature chains, yielding visual explanations (e.g., Sankey graphs, cluster-wise couplings, time-delayed routes) (Lin et al., 2020, Fiedler et al., 2024).
Robustness: By enforcing sequential alignment and cost minimization only through stable intermediates, chaining cancels the detrimental effects of sampling noise, outliers, and spurious features. Empirical results indicate consistent performance even as data complexity increases or input distributions are perturbed (Lin et al., 2020, Zia et al., 28 Jan 2026).
Computational tractability: Chaining formulations reduce search space by pruning infeasible transitions, exploiting factorization, and leveraging entropic regularization and value iteration, ensuring polynomial or near-polynomial solution time on realistic instances (Fiedler et al., 2024, Lin et al., 2020, Moulos, 2020).

Optimal Transport Chaining encapsulates a broad spectrum of theoretically sound, practically efficient methods for combining stochastic, geometric, or topological objects under layered or sequential constraints, expanding the reach of optimal transport theory to complex and structured modern data environments.