Papers
Topics
Authors
Recent
Search
2000 character limit reached

DualCD: Causal Disentanglement for Time Series

Updated 22 January 2026
  • DualCD is a modular framework that disentangles raw temporal features into causal (invariant) and spurious subspaces, enhancing prediction stability.
  • It employs dual-causal interventions—via intra-class and inter-class perturbations—to enforce class invariance and mitigate the effects of domain shifts.
  • Integrated with various time series classifiers, DualCD significantly improves predictive accuracy and reduces catastrophic forgetting in sequential learning.

The Dual-Causal Disentanglement Framework (DualCD) refers to a lightweight, modular architecture designed for domain-incremental time series classification (DI-TSC). DualCD enables robust continual learning by explicitly disentangling raw temporal features into causal and spurious subspaces, and deploying dual-causal intervention mechanisms to ensure prediction stability under intra-class and inter-class confounding variations. As a plug-and-play module, DualCD can be integrated into a variety of state-of-the-art time series classifiers, yielding significant improvements in both predictive accuracy and resilience to catastrophic forgetting across sequentially encountered domains (Liu et al., 15 Jan 2026).

1. Problem Setting and Causal Motivation

Domain-incremental time series classification (DI-TSC) organizes data as a sequence of domains D={D1,...,DT}\mathbb{D} = \{\mathcal{D}^1, ..., \mathcal{D}^T\}, each containing samples (Xnt,ynt)(X_n^t, y_n^t). All domains share a label set Y\mathcal{Y}, but the feature distributions pt(X)p^t(X) differ. The training protocol requires the model f(â‹…)f(\cdot) to absorb each domain sequentially, retaining performance on past domains while learning the new one. Standard fine-tuning typically results in catastrophic forgetting, as the model's intermediate features absorb spurious, domain-specific information at the expense of stable class-causal structure.

DualCD is motivated by the causal principle that, for domain-robust classification, the decision function should depend exclusively on class-causal features (ZRZ_R), which are invariant across domains, while being insensitive to nuisance or spurious features (ZIZ_I) that encode non-causal variability. By disentangling ZZ into (ZR,ZI)(Z_R, Z_I) and imposing causal interventions during training, DualCD prevents confounder-driven decision instability and supports long-term knowledge retention (Liu et al., 15 Jan 2026).

2. Temporal Feature Disentanglement

Given any time-series encoder ψ(⋅;Θ1)\psi(\cdot;\Theta_1) (e.g., CNN, Transformer) mapping (Xnt,ynt)(X_n^t, y_n^t)0 to (Xnt,ynt)(X_n^t, y_n^t)1, DualCD produces two orthogonal masks:

  • (Xnt,ynt)(X_n^t, y_n^t)2, (Xnt,ynt)(X_n^t, y_n^t)3, where (Xnt,ynt)(X_n^t, y_n^t)4 is produced by a one-layer MLP and (Xnt,ynt)(X_n^t, y_n^t)5 is the sigmoid.
  • These masks decompose the raw feature as (Xnt,ynt)(X_n^t, y_n^t)6, (Xnt,ynt)(X_n^t, y_n^t)7, where (Xnt,ynt)(X_n^t, y_n^t)8 denotes element-wise multiplication and (Xnt,ynt)(X_n^t, y_n^t)9 enforces orthogonality.

Y\mathcal{Y}0 thus represents the causal component (presumed invariant and predictive of Y\mathcal{Y}1), while Y\mathcal{Y}2 represents spurious nuisance features. Empirical studies confirm that imposing this hard orthogonality constraint yields significantly better robustness compared to architectures using two uncorrelated MLPs for feature splitting (see Section 6.2 below) (Liu et al., 15 Jan 2026).

3. Dual-Causal Intervention Mechanisms

To operationalize the causal invariance of Y\mathcal{Y}3, DualCD introduces two complementary intervention protocols:

  1. Intra-Class Perturbation: For each training sample of class Y\mathcal{Y}4, its spurious feature Y\mathcal{Y}5 is replaced with a randomly selected Y\mathcal{Y}6 from another sample of class Y\mathcal{Y}7 (same class, possibly different domain). The classifier's prediction on Y\mathcal{Y}8 is expected to remain Y\mathcal{Y}9. The intra-class intervention loss is:

pt(X)p^t(X)0

  1. Inter-Class Perturbation: For each sample of class pt(X)p^t(X)1, replace its spurious vector pt(X)p^t(X)2 with a causal vector pt(X)p^t(X)3 from a sample of a different class. The perturbed embedding is pt(X)p^t(X)4, and the classifier must still predict pt(X)p^t(X)5:

pt(X)p^t(X)6

The combined intervention objective, balanced by hyperparameter pt(X)p^t(X)7:

pt(X)p^t(X)8

This dual intervention compels the classifier to base predictions strictly on pt(X)p^t(X)9, rendering it robust under confounding variation both within and between classes (Liu et al., 15 Jan 2026).

4. Model Integration and Optimization

DualCD is model-agnostic and inserts between any encoder f(â‹…)f(\cdot)0 and classifier f(â‹…)f(\cdot)1. The total loss on each domain incorporates both the standard classification objective and the intervention regularizer:

f(â‹…)f(\cdot)2

where f(â‹…)f(\cdot)3 is computed on original (non-perturbed) features, and f(â‹…)f(\cdot)4 controls intervention strength.

After each domain, parameters f(â‹…)f(\cdot)5 are propagated as initialization for subsequent domains, supporting continual learning. The orthogonal mask structure incurs only an extra f(â‹…)f(\cdot)6 computational cost per embedding dimension f(â‹…)f(\cdot)7.

5. Empirical Evaluation

Experiments are conducted on subject-wise domain-organized benchmarks: HAR (6 human activity classes, 10 domains), HHAR (6 activity classes, 9 domains), ISRUC-S3 (5 sleep stages, 10 domains), and Sleep-EDF (5 sleep stages, 10 domains). The key metrics are:

  • Average Accuracy (ACC): precision across all domains.
  • Relative Forgetting (RF): performance degradation on prior domains post-training.
  • Performance-aware Relative Forgetting (PRF): RF weighted by initial accuracy.

DualCD demonstrates substantial improvements across all metrics and backbones. On HAR, for example, the best prior domain-incremental method (DualCP) achieves ACC=0.8302, RF=0.1562, PRF=0.0310, whereas DualCD attains ACC=0.8565, RF=0.1410, PRF=0.0266. Relative gains of 3–9% in ACC and 10–20% reduction in RF/PRF occur across datasets. Plugging DualCD into six backbone models yields 5–80% (relative) ACC improvement and 20–78% reduction in PRF, indicating generality (Liu et al., 15 Jan 2026).

Ablation analyses reveal that omitting either intervention term (f(⋅)f(\cdot)8 or f(⋅)f(\cdot)9) drops accuracy by 9–14% and inflates PRF by 65–94%. Replacing the orthogonal mask with disjoint MLP modules severely degrades robustness, confirming the efficacy of the dual-causal, disentangled structure.

6. Learned Representation Visualization and Analysis

t-SNE visualizations on HAR’s first domain show that vanilla backbones produce heavily overlapping class clusters, while DualCD’s ZRZ_R0 forms well-separated, compact clusters, indicating successful causal disentanglement. KL-divergence measures across domains further show that vanilla encoders induce feature collapse (low inter-domain KL), characteristic of forgetting, whereas DualCD preserves richer domain-invariant causal information in ZRZ_R1.

7. Broader Implications, Generality, and Limitations

DualCD’s core contributions—orthogonal feature disentanglement and dual-causal intervention—yield class-invariant representations functionally robust to sequential domain shifts, thereby mitigating catastrophic forgetting in DI-TSC. Its modular configuration enables broad applicability with minimal overhead. While current experiments demonstrate efficacy in time series data, the direct transposability to non-temporal domains remains to be systematically established. A plausible implication is that similar causal disentanglement and intervention schemes may benefit other forms of continual or transfer learning characterized by non-stationary confounding.

Summary Table: Key Components and Roles in DualCD (Liu et al., 15 Jan 2026)

Component Description Role
Temporal feature disentanglement Orthogonal mask splits ZRZ_R2 into ZRZ_R3 (causal) and ZRZ_R4 (spurious) Builds causal invariance
Intra-class intervention Mixes ZRZ_R5 with spurious features from same class Enforces class stability
Inter-class intervention Combines ZRZ_R6 with ZRZ_R7 from another class Ensures discriminativity
Causal intervention loss Cross-entropy on perturbed samples Forces label invariance
Plug-and-play integration Modular between encoder and classifier Generalizes to backbones

The Dual-Causal Disentanglement Framework establishes a principled and empirically validated approach for harnessing causal invariants in domain-incremental sequential learning contexts, with demonstrated superiority over existing backbone and continual/domain-incremental learning techniques (Liu et al., 15 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dual-Causal Disentanglement Framework (DualCD).