DualCD: Causal Disentanglement for Time Series
- DualCD is a modular framework that disentangles raw temporal features into causal (invariant) and spurious subspaces, enhancing prediction stability.
- It employs dual-causal interventions—via intra-class and inter-class perturbations—to enforce class invariance and mitigate the effects of domain shifts.
- Integrated with various time series classifiers, DualCD significantly improves predictive accuracy and reduces catastrophic forgetting in sequential learning.
The Dual-Causal Disentanglement Framework (DualCD) refers to a lightweight, modular architecture designed for domain-incremental time series classification (DI-TSC). DualCD enables robust continual learning by explicitly disentangling raw temporal features into causal and spurious subspaces, and deploying dual-causal intervention mechanisms to ensure prediction stability under intra-class and inter-class confounding variations. As a plug-and-play module, DualCD can be integrated into a variety of state-of-the-art time series classifiers, yielding significant improvements in both predictive accuracy and resilience to catastrophic forgetting across sequentially encountered domains (Liu et al., 15 Jan 2026).
1. Problem Setting and Causal Motivation
Domain-incremental time series classification (DI-TSC) organizes data as a sequence of domains , each containing samples . All domains share a label set , but the feature distributions differ. The training protocol requires the model to absorb each domain sequentially, retaining performance on past domains while learning the new one. Standard fine-tuning typically results in catastrophic forgetting, as the model's intermediate features absorb spurious, domain-specific information at the expense of stable class-causal structure.
DualCD is motivated by the causal principle that, for domain-robust classification, the decision function should depend exclusively on class-causal features (), which are invariant across domains, while being insensitive to nuisance or spurious features () that encode non-causal variability. By disentangling into and imposing causal interventions during training, DualCD prevents confounder-driven decision instability and supports long-term knowledge retention (Liu et al., 15 Jan 2026).
2. Temporal Feature Disentanglement
Given any time-series encoder (e.g., CNN, Transformer) mapping 0 to 1, DualCD produces two orthogonal masks:
- 2, 3, where 4 is produced by a one-layer MLP and 5 is the sigmoid.
- These masks decompose the raw feature as 6, 7, where 8 denotes element-wise multiplication and 9 enforces orthogonality.
0 thus represents the causal component (presumed invariant and predictive of 1), while 2 represents spurious nuisance features. Empirical studies confirm that imposing this hard orthogonality constraint yields significantly better robustness compared to architectures using two uncorrelated MLPs for feature splitting (see Section 6.2 below) (Liu et al., 15 Jan 2026).
3. Dual-Causal Intervention Mechanisms
To operationalize the causal invariance of 3, DualCD introduces two complementary intervention protocols:
- Intra-Class Perturbation: For each training sample of class 4, its spurious feature 5 is replaced with a randomly selected 6 from another sample of class 7 (same class, possibly different domain). The classifier's prediction on 8 is expected to remain 9. The intra-class intervention loss is:
0
- Inter-Class Perturbation: For each sample of class 1, replace its spurious vector 2 with a causal vector 3 from a sample of a different class. The perturbed embedding is 4, and the classifier must still predict 5:
6
The combined intervention objective, balanced by hyperparameter 7:
8
This dual intervention compels the classifier to base predictions strictly on 9, rendering it robust under confounding variation both within and between classes (Liu et al., 15 Jan 2026).
4. Model Integration and Optimization
DualCD is model-agnostic and inserts between any encoder 0 and classifier 1. The total loss on each domain incorporates both the standard classification objective and the intervention regularizer:
2
where 3 is computed on original (non-perturbed) features, and 4 controls intervention strength.
After each domain, parameters 5 are propagated as initialization for subsequent domains, supporting continual learning. The orthogonal mask structure incurs only an extra 6 computational cost per embedding dimension 7.
5. Empirical Evaluation
Experiments are conducted on subject-wise domain-organized benchmarks: HAR (6 human activity classes, 10 domains), HHAR (6 activity classes, 9 domains), ISRUC-S3 (5 sleep stages, 10 domains), and Sleep-EDF (5 sleep stages, 10 domains). The key metrics are:
- Average Accuracy (ACC): precision across all domains.
- Relative Forgetting (RF): performance degradation on prior domains post-training.
- Performance-aware Relative Forgetting (PRF): RF weighted by initial accuracy.
DualCD demonstrates substantial improvements across all metrics and backbones. On HAR, for example, the best prior domain-incremental method (DualCP) achieves ACC=0.8302, RF=0.1562, PRF=0.0310, whereas DualCD attains ACC=0.8565, RF=0.1410, PRF=0.0266. Relative gains of 3–9% in ACC and 10–20% reduction in RF/PRF occur across datasets. Plugging DualCD into six backbone models yields 5–80% (relative) ACC improvement and 20–78% reduction in PRF, indicating generality (Liu et al., 15 Jan 2026).
Ablation analyses reveal that omitting either intervention term (8 or 9) drops accuracy by 9–14% and inflates PRF by 65–94%. Replacing the orthogonal mask with disjoint MLP modules severely degrades robustness, confirming the efficacy of the dual-causal, disentangled structure.
6. Learned Representation Visualization and Analysis
t-SNE visualizations on HAR’s first domain show that vanilla backbones produce heavily overlapping class clusters, while DualCD’s 0 forms well-separated, compact clusters, indicating successful causal disentanglement. KL-divergence measures across domains further show that vanilla encoders induce feature collapse (low inter-domain KL), characteristic of forgetting, whereas DualCD preserves richer domain-invariant causal information in 1.
7. Broader Implications, Generality, and Limitations
DualCD’s core contributions—orthogonal feature disentanglement and dual-causal intervention—yield class-invariant representations functionally robust to sequential domain shifts, thereby mitigating catastrophic forgetting in DI-TSC. Its modular configuration enables broad applicability with minimal overhead. While current experiments demonstrate efficacy in time series data, the direct transposability to non-temporal domains remains to be systematically established. A plausible implication is that similar causal disentanglement and intervention schemes may benefit other forms of continual or transfer learning characterized by non-stationary confounding.
Summary Table: Key Components and Roles in DualCD (Liu et al., 15 Jan 2026)
| Component | Description | Role |
|---|---|---|
| Temporal feature disentanglement | Orthogonal mask splits 2 into 3 (causal) and 4 (spurious) | Builds causal invariance |
| Intra-class intervention | Mixes 5 with spurious features from same class | Enforces class stability |
| Inter-class intervention | Combines 6 with 7 from another class | Ensures discriminativity |
| Causal intervention loss | Cross-entropy on perturbed samples | Forces label invariance |
| Plug-and-play integration | Modular between encoder and classifier | Generalizes to backbones |
The Dual-Causal Disentanglement Framework establishes a principled and empirically validated approach for harnessing causal invariants in domain-incremental sequential learning contexts, with demonstrated superiority over existing backbone and continual/domain-incremental learning techniques (Liu et al., 15 Jan 2026).