Entropic Regularized TiOT

Updated 30 December 2025

The paper introduces a novel entropic regularized framework that approximates Time-integrated Optimal Transport (TiOT) using a minimax Wasserstein metric and block coordinate descent.
It employs Sinkhorn iterations and gradient-projected updates for balancing temporal alignment with feature distribution similarity, ensuring robust convergence and numerical stability.
Empirical results demonstrate improved one-to-one temporal feature matching and competitive classification accuracy on benchmark time series compared to classical OT methods.

The entropic regularized approximation of Time-integrated Optimal Transport (TiOT) is a computational framework for comparing time series and general temporal or sequential data via minimax optimal transport objectives. The TiOT metric integrates both temporal alignment and feature-wise distributional similarity by forming a robust Wasserstein-type distance. Entropic regularization is introduced to the inner optimal transport subproblem to obtain a strongly convex program that is efficiently solved by block coordinate descent methods, yielding reliable statistical rates and practical scalability for large datasets (Nguyen et al., 26 Dec 2025).

1. Definition and Structure of Time-integrated Optimal Transport

Let $\alpha = \sum_i a_i \delta_{(x_i,t_i)}$ and $\beta = \sum_j b_j \delta_{(y_j,s_j)}$ be discrete probability measures on $\mathbb{R}^d \times \mathbb{R}$ , representing distributions of features and timestamps. The TiOT metric is defined as

$\mathcal{D}_p(\alpha, \beta) = \max_{w \in [0,1]} \left\{ \min_{\pi \in \Pi(\alpha, \beta)} \int d_{p,w}((x,t),(y,s))^p \, d\pi \right\}^{1/p}$

where $d_{p,w}((x,t),(y,s)) = \left( w \|x-y\|_p^p + (1-w)|t-s|^p \right)^{1/p}$ integrates feature and temporal discrepancy, and $\Pi(\alpha, \beta)$ is the set of couplings with specified marginals.

This formulation produces a minimax Wasserstein metric, robustly balancing the spatial (feature) and temporal components by maximizing over the interpolation parameter $w$ . In the discrete setting, the cost matrix $C(w)$ is computed as $C(w)_{ij} = d_{p,w}((x_i, t_i), (y_j, s_j))^p$ (Nguyen et al., 26 Dec 2025).

2. Entropic Regularization of TiOT

Entropic regularization is applied within the inner minimization:

$\min_{\pi \in \Pi(\alpha, \beta)} \langle C(w), \pi \rangle + \varepsilon \,\mathrm{KL}(\pi \| a \circ b)$

where $\mathrm{KL}(\pi \| a \circ b) = \sum_{i,j} \Big[ \pi_{ij} \ln(\frac{\pi_{ij}}{a_i b_j}) - \pi_{ij} + a_i b_j \Big]$ is the Kullback–Leibler divergence relative to the product measure. This regularization enforces strict convexity and numerical stability in the coupling, enabling log-domain solution methods. The outer maximization in $w$ remains unregularized, preserving the minimax structure (Nguyen et al., 26 Dec 2025).

The dual Lagrangian, with normalization $a^\top u = 0$ , is

$F(u,v,w) = -u^\top a - v^\top b + \varepsilon \sum_{i,j} a_i b_j \exp\left(\frac{u_i + v_j - c_{ij}(w)}{\varepsilon} \right) - \varepsilon$

which is jointly convex in $(u, v)$ for fixed $w$ .

3. Algorithmic Framework: Block Coordinate Descent

The entropic-regularized TiOT (eTiOT) is solved by alternating updates of dual potentials $(u, v)$ and the interpolation parameter $w$ in a block coordinate descent (BCD) algorithm. Optimization proceeds as follows:

Sinkhorn iterations update $u_i$ and $v_j$ via

$u_i^{k+1} = -\varepsilon \ln\left[(K(w^k) h^k)_i\right] + \lambda^k,\quad v_j^{k+1} = -\varepsilon \ln\left[(K(w^k)^\top g^{k+1})_j\right]$

where $K(w) = \exp(-C(w)/\varepsilon)$ , $\lambda^k$ enforces $a^\top u = 0$ (Nguyen et al., 26 Dec 2025).

The $w$ parameter is updated by gradient descent projected onto $[0,1]$ :

$w^{k+1} = \mathrm{Proj}_{[0,1]} \left( w^k - \eta \, \partial_w F(u^{k+1}, v^{k+1}, w^k) \right)$

Gradient steps for $w$ leverage closed-form expressions with computable block Lipschitz constants, guaranteeing stability even as $\varepsilon \rightarrow 0$ .

The coupling is reconstructed as

$\pi = \mathrm{Diag}(g) K \mathrm{Diag}(h)$

with $g, h$ the current scaling vectors.

Practical details include updating $w$ every $freq$ Sinkhorn cycles, step-size adaptation via local curvature $\sigma \approx \frac{1}{\varepsilon} g^\top \left( (\Phi - \Gamma)^2 \circ K \right) h$ , and convergence monitoring by $\ell_1$ marginal error (Nguyen et al., 26 Dec 2025).

4. Theoretical Guarantees and Convergence Properties

The eTiOT optimization problem is convex in the joint variables $(u, v, w)$ . Crucial theoretical results include:

Dual variables satisfy stability bounds: $\|u^k\|_\infty \leq 2\|C\|_\infty$ , $\|v^k\|_\infty \leq 3\|C\|_\infty$ Lemma 3.3.
The block-wise gradient in $w$ is globally Lipschitz, with $|\partial_w F(u,v,w) - \partial_w F(u,v,w')| \leq L_w |w-w'|$ , where $L_w = (\| \hat{C} \|_\infty^2 / \varepsilon) \exp(6\|C\|_\infty/\varepsilon)$ Lemma 3.4.
The objective decreases by at least $\kappa (\|u^k - u^{k+1}\|^2 + \|v^k - v^{k+1}\|^2) + \tau |w^k - w^{k+1}|^2$ , with explicit constants Lemma 3.5.
Global convergence to a stationary point is established (Theorem 3.7), with sublinear rate $O(1/k)$ (Theorem 3.9). Rate constants $\rho_1, \rho_2$ depend polynomially on problem sizes and exponentially on $\|C\|_\infty/\varepsilon$ .

When $\varepsilon \rightarrow 0$ , the eTiOT cost converges to the original TiOT metric, recovering exact minimax optimal transport.

5. Computational Complexity and Practical Implementation

Each BCD iteration requires $O(mn)$ time, dominated by two kernel-matrix vector multiplications and marginal normalization. No explicit $O(n^2)$ allocation of transport plans is incurred due to the Sinkhorn factorization. The algorithm is amenable to GPU parallelization via block operations on $K, g, h$ .

Recommended hyperparameters:

$\varepsilon$ values in $[0.01, 0.1]$ for practical balance of approximation error and convergence speed.
Stopping threshold for marginal error $\|g \odot (K h) - a\|_1 < 5 \times 10^{-3}$ .
Step-size $\eta$ adapted via curvature estimates for numerical stability.

Updating $w$ only every $freq$ cycles (typically $10 \leq freq \leq 50$ ) amortizes gradient cost.

6. Empirical Performance and Applications

Empirical findings indicate that eTiOT:

Produces more meaningful one-to-one temporal feature matchings than fixed- $w$ OT, as demonstrated on time series data (Fig. 3)(Nguyen et al., 26 Dec 2025).
Demonstrates convergence of eTiOT objective to that of unregularized TiOT as $\varepsilon \rightarrow 0$ (Fig. 4).
Computational overhead is $2$– $3\times$ classical Sinkhorn OT, but vastly superior to solving the unregularized TiOT LP by direct minimax (Nguyen et al., 26 Dec 2025).
On real benchmark datasets (15 UCR time series), 1-NN classification accuracy using eTiOT matches or exceeds Euclidean, DTW, and previous time-adaptive OT (eTAOT) methods; robustness to $\varepsilon$ is observed, whereas eTAOT requires careful tuning of the $w$ parameter (Table 1, Fig. 5).

The eTiOT methodology builds on advances in entropic regularized OT:

Strong convexity induced by entropy penalties enables efficient, numerically stable approximation of minimax Wasserstein objectives.
Sinkhorn iterations and dual semi-dual strategies facilitate scalable optimization in high-dimensional spaces (Cuturi et al., 2018, Lin et al., 2019, Carlier et al., 2015).
Neural parameterizations offer further scalability and minimax-optimal statistical rates in large-scale and high-dimensional OT estimation (Wang et al., 2024).

TiOT thus inherits both the theoretical consistency and practical efficiency of entropic regularization, while extending Wasserstein-based approaches to time-integrated, distributional alignment tasks.

References:

TiOT framework, entropic regularization, complexity analysis, and block coordinate algorithm: (Nguyen et al., 26 Dec 2025) Entropic regularization in general OT contexts: (Cuturi et al., 2018, Lin et al., 2019, Carlier et al., 2015, Clason et al., 2019, Wang et al., 2024, Lin et al., 2019)