Continuous Dynamic Time Warping (CDTW)

Updated 2 December 2025

Continuous Dynamic Time Warping (CDTW) is a similarity measure that computes the minimum accumulated cost over continuous, piecewise-smooth, and monotonic alignments between time series or curves.
It overcomes classic DTW’s discrete alignment artifacts and Fréchet distance’s sensitivity to outliers by integrating local matching costs along smooth paths.
CDTW supports exact, approximate, and differentiable algorithmic solutions, benefiting applications in trajectory clustering, action recognition, and time-series dictionary learning.

Continuous Dynamic Time Warping (CDTW) is a similarity measure for time series and curves that generalizes classic discrete Dynamic Time Warping (DTW) to allow continuous, piecewise-smooth, and monotonic alignments between signals or curves. CDTW integrates the local cost of matching curve points along continuous monotone paths, unlike standard DTW which considers only alignments of discrete sample indices. This construction yields robustness to non-uniform sampling, eliminates step-artifacts in alignments, and inherits DTW's resistance to outliers while enjoying the sampling-independence of continuous measures like the Fréchet distance. The CDTW framework appears in several domains, including trajectory clustering, end-to-end learning of alignment paths in deep models, continuous action recognition, and time-series dictionary learning. Recent years have brought both exact and approximate algorithms for CDTW in one and higher dimensions, with complexity bounds, computational barriers under various norms, and differentiable implementations for machine learning.

1. Mathematical Foundation and Formal Definition

CDTW is defined as the minimum accumulated cost over all continuous, monotonic, and boundary-constrained reparameterizations (alignments) between two signals or curves. For two continuous, piecewise-linear curves $P, Q$ of arc lengths $p, q$ in $\mathbb R^d$ , CDTW is given by

$\mathrm{cdtw}(P, Q) = \inf_{(f, g)} \int_0^1 \|f(t) - g(t)\| \ \bigl(\|f'(t)\| + \|g'(t)\|\bigr) dt,$

where $(f, g)$ are monotone, continuous parameterizations of $P$ and $Q$ , with $f(0)=P(0), f(1)=P(p)$ (and similarly for $g$ ), and $\|\cdot\|$ is the underlying norm (typically $p, q$ 0 or $p, q$ 1). When aligning time series $p, q$ 2, a scalar-valued, monotonic, boundary-constrained warping $p, q$ 3 gives rise to the cost

$p, q$ 4

subject to $p, q$ 5, $p, q$ 6, $p, q$ 7, and optional local/global constraints (Xu et al., 2023, Buchin et al., 2022, Deriso et al., 2019, Brankovic et al., 2020, Buchin et al., 25 Nov 2025).

For one-dimensional signals, the parameter space $p, q$ 8 is equipped with the cost surface $p, q$ 9, and the optimal path $\mathbb R^d$ 0 is non-decreasing with respect to both time axes (Buchin et al., 2022).

CDTW interpolates between two canonical distance measures for time series and curves: classic DTW and the Fréchet distance. Whereas DTW minimizes the sum over a discrete alignment path and is highly sensitive to sampling density, and the Fréchet distance gives the maximum (bottleneck) deviation along a continuous monotonic coupling (hence sensitive to outliers), CDTW takes the sum (or integral) over a continuous path and thus achieves both sampling rate robustness and resilience to outliers (Buchin et al., 2022, Brankovic et al., 2020, Buchin et al., 25 Nov 2025).

Empirical studies confirm that clustering, map-matching, and averaging under CDTW avoid both the “staircase artifacts” associated with DTW and the spike-following sensitivity of the Fréchet distance (Brankovic et al., 2020, Buchin et al., 2022). In trajectory clustering with bounded-complexity centers, CDTW strictly outperforms both discrete DTW and Fréchet-based approaches with respect to reconstruction fidelity and robustness (Brankovic et al., 2020).

3. Algorithms: Exact, Approximate, and Differentiable Solutions

One-Dimensional and Piecewise Linear Case

Exact Dynamic Programming for 1D: In one dimension, the first exact polynomial-time ( $\mathbb R^d$ 1) algorithm for CDTW propagates piecewise-quadratic cost functions across the parameter grid, avoiding exponential blowup by controlling their combinatorial structure (Buchin et al., 2022).
Approximation in Higher Dimensions: In 2D, exact algebraic computation is impossible under $\mathbb R^d$ 2 (the value can be transcendental), but for any fixed (polygonal) norm an exact DP can be constructed. Approximation algorithms use polygonal gauges to obtain a $\mathbb R^d$ 3-approximation to true Euclidean CDTW in polynomial time, leveraging cell-local optimality and lower-envelope propagation (Buchin et al., 25 Nov 2025).

Discrete-Time and Time-Series

Bi-level Optimization & Implicit Differentiation: In deep learning applications, CDTW is formulated as a constrained variational problem where the best warping path is the solution to a lower-level optimization layer, and gradients with respect to data can be computed via implicit differentiation through the KKT conditions. The DecDTW framework explicitly recovers the optimal path (not a “soft” approximation) for end-to-end loss optimization (Xu et al., 2023).
Block Coordinate Descent in Dictionary Learning: In time-series dictionary learning for classification, the continuous warping operator is parameterized by monotone basis functions (e.g., I-splines), and warping parameters are optimized jointly with sparse coding and dictionary atoms using block coordinate descent and quadratic programming (Xu et al., 2023).
Sinc Interpolation and Gradient Descent: The Trainable Time Warping (TTW) approach parameterizes the warping function in a discrete sine basis and differentiates through ideal sinc interpolation for highly efficient gradient-based learning of continuous alignments (Khorram et al., 2019).

Pruning, Early Abandoning, and Special Cases

Constrained CDTW (Band-limited): The Sakoe–Chiba band yields constrained DTW (sometimes confusingly referred to as CDTW in the library context), restricting alignments to within a fixed window, enabling significant speedups via early abandoning and pruning (Herrmann et al., 2021).
Dynamic Frame Warping for Action Recognition: In video/action recognition, dynamic frame warping extends DTW to provide both recognition and segmentation in a single forward-backward pass over frame-level features, supporting simultaneous label assignment and segmentation (Kulkarni et al., 2014).

4. Optimization Frameworks and Numerical Schemes

CDTW problems typically reduce to discretized or continuous nonlinear programs with pointwise data terms, regularization on the cumulative and/or instantaneous warp, and boundary/monotonicity constraints (Deriso et al., 2019, Xu et al., 2023). Practical iterative schemes consist of:

Discretization of both time and warp function, often forming a layered graph with monotonicity constraints.
Dynamic programming over the discretized parameter space.
Iterative refinement of the grid around optimal paths to decrease quantization error.
Use of lower-envelopes, cell-local analytic forms, and graph-based shortest path computations for additive approximation under tolerance $\mathbb R^d$ 4 (Brankovic et al., 2020).

In differentiable cases, implicit differentiation computes the response of the optimal alignment to changes in inputs, supporting gradient flow in deep models (Xu et al., 2023). Block coordinate or alternating optimization is standard in joint dictionary/warping estimation (Xu et al., 2023).

5. Empirical Applications and Case Studies

CDTW and its variants have been empirically validated in several domains:

Trajectory Clustering: Center-based $\mathbb R^d$ 5-medians clustering under CDTW achieves smoother, more robust cluster representatives, especially for low-complexity centers and in the presence of outliers, surpassing both DTW and Fréchet-based methods (Brankovic et al., 2020).
Deep End-to-End Alignment: In music information retrieval (audio-to-score alignment) and visual place recognition, DecDTW yields state-of-the-art alignment errors and localization accuracy, outperforming Soft-DTW and fixed-path approaches (Xu et al., 2023).
Classification & Dictionary Learning: CDTW-based dictionary learning (GTWIDL) achieves the highest mean accuracy in classification and clustering across multiple UCR datasets, yielding both improved discriminative power and better representations (Xu et al., 2023).
Time-Series Averaging: TTW outperforms classical DTW averaging and generalized time warping on more than 65% of UCR datasets in centroid quality and classification accuracy, demonstrating practical and scalable differentiation-based continuous warping (Khorram et al., 2019).
Video Action Recognition: Continuous action segmentation via Dynamic Frame Warping (DFW) improves frame-labeling accuracy in challenging datasets relative to prior isolated or bottleneck-based segmentation algorithms (Kulkarni et al., 2014).

6. Computational Barriers and Extensions

Recent theory reveals that exact computation of Euclidean CDTW in 2D (and higher) lies beyond the algebraic model over $\mathbb R^d$ 6 (unsolvability), as optimal costs and paths can be transcendental (Buchin et al., 25 Nov 2025). Approximation by regular polygonal norms gives polynomial-time schemes with bounded error. The same DP frameworks extend to partial and lexicographic Fréchet similarity, as well as arbitrary centrally-symmetric polygonal norms. Key open problems include:

Polynomial bounding of the number of propagated cost-function pieces in 2D.
Designing PTAS or further complexity improvements for Euclidean CDTW.
Extending the framework to higher dimensions, alternative measures, and integrating measure-weighted or nonparametric time series.

7. Summary Table: Core CDTW Algorithms

Problem/Domain	Exact/Approximate	Complexity / Feasibility	Key References
1D polygonal curves	Exact	$\mathbb R^d$ 7	(Buchin et al., 2022)
2D polygonal, $\mathbb R^d$ 8	$\mathbb R^d$ 9-approx	$\mathrm{cdtw}(P, Q) = \inf_{(f, g)} \int_0^1 \\|f(t) - g(t)\\| \ \bigl(\\|f'(t)\\| + \\|g'(t)\\|\bigr) dt,$ 0 for $\mathrm{cdtw}(P, Q) = \inf_{(f, g)} \int_0^1 \\|f(t) - g(t)\\| \ \bigl(\\|f'(t)\\| + \\|g'(t)\\|\bigr) dt,$ 1-gon norm	(Buchin et al., 25 Nov 2025)
End-to-end learning	Differentiable	$\mathrm{cdtw}(P, Q) = \inf_{(f, g)} \int_0^1 \\|f(t) - g(t)\\| \ \bigl(\\|f'(t)\\| + \\|g'(t)\\|\bigr) dt,$ 2 with $\mathrm{cdtw}(P, Q) = \inf_{(f, g)} \int_0^1 \\|f(t) - g(t)\\| \ \bigl(\\|f'(t)\\| + \\|g'(t)\\|\bigr) dt,$ 3 knots, $\mathrm{cdtw}(P, Q) = \inf_{(f, g)} \int_0^1 \\|f(t) - g(t)\\| \ \bigl(\\|f'(t)\\| + \\|g'(t)\\|\bigr) dt,$ 4 grid	(Xu et al., 2023)
Dictionary learning	Block-coordinate	$\mathrm{cdtw}(P, Q) = \inf_{(f, g)} \int_0^1 \\|f(t) - g(t)\\| \ \bigl(\\|f'(t)\\| + \\|g'(t)\\|\bigr) dt,$ 5	(Xu et al., 2023)
Band-limited DTW	Exact/Pruned	$\mathrm{cdtw}(P, Q) = \inf_{(f, g)} \int_0^1 \\|f(t) - g(t)\\| \ \bigl(\\|f'(t)\\| + \\|g'(t)\\|\bigr) dt,$ 6 worst-case, $\mathrm{cdtw}(P, Q) = \inf_{(f, g)} \int_0^1 \\|f(t) - g(t)\\| \ \bigl(\\|f'(t)\\| + \\|g'(t)\\|\bigr) dt,$ 7 practical	(Herrmann et al., 2021)
Time-series average	Differentiable	$\mathrm{cdtw}(P, Q) = \inf_{(f, g)} \int_0^1 \\|f(t) - g(t)\\| \ \bigl(\\|f'(t)\\| + \\|g'(t)\\|\bigr) dt,$ 8 per iteration	(Khorram et al., 2019)