- The paper introduces CaRTeD, which jointly learns temporal causal structures and performs irregular tensor decomposition on high-dimensional data.
- It extends PARAFAC2 and integrates dynamic Bayesian networks to capture both contemporaneous and temporal dependencies among latent clusters.
- Empirical results on synthetic and real EHR datasets demonstrate superior tensor and causal graph recovery with theoretical convergence guarantees.
Temporal Causal Representation Learning with Tensor Decomposition: An Expert Overview
This work introduces CaRTeD, a joint-learning framework for temporal causal representation learning and irregular tensor decomposition, with a focus on high-dimensional, temporally irregular data such as electronic health records (EHRs). The paper addresses the challenge of inferring causal structures among latent clusters (e.g., phenotypes) in settings where data are naturally represented as irregular tensors, and where both the clustering and the causal relationships are of primary interest.
Problem Setting and Motivation
Traditional causal representation learning (CRL) methods are typically designed for flat, low-dimensional data and do not scale to high-dimensional, irregularly sampled time series. In many real-world applications, such as EHRs, data are best modeled as tensors with one or more irregular modes (e.g., patients with varying numbers of visits). Existing tensor decomposition methods, such as CP and PARAFAC2, have been used for computational phenotyping but do not incorporate causal structure learning, and vice versa.
The key innovation in this work is the integration of temporal causal structure learning with irregular tensor decomposition, enabling the discovery of both meaningful clusters (phenotypes) and their contemporaneous and temporal causal relationships directly from high-dimensional, irregular data.
Methodological Contributions
The CaRTeD framework unifies temporal causal network inference and computational phenotyping via a joint optimization problem. The main components are:
- Irregular Tensor Decomposition: The framework extends PARAFAC2 to handle irregular tensors, where one mode (e.g., time/visits) varies across slices (e.g., patients). The decomposition yields patient-specific temporal trajectories, phenotype memberships, and a global phenotype-feature matrix.
- Temporal Causal Structure Learning: Causal relationships among latent clusters are modeled using a dynamic Bayesian network (DBN) formalism, capturing both intra-slice (contemporaneous) and inter-slice (temporal) dependencies.
- Joint Optimization: The objective function combines tensor reconstruction loss with a causal regularization term, enforcing that the latent trajectories conform to a sparse, acyclic temporal causal model. The optimization is performed via block coordinate descent, with each block (tensor factors and causal parameters) updated using ADMM or first-order methods.
- Theoretical Guarantees: The paper provides a convergence analysis for the non-convex, constrained optimization problem, filling a gap in the literature regarding theoretical guarantees for irregular tensor decomposition with non-convex constraints.
Algorithmic Structure
The overall algorithm alternates between updating the tensor factors (using consensus ADMM with causal regularization) and updating the causal structure (using aggregated patient-level statistics and acyclicity constraints). The updates for each block admit closed-form or efficiently solvable subproblems, and the framework is scalable to large, sparse datasets.
Empirical Evaluation
Synthetic Data
Experiments on simulated irregular tensors with known ground-truth causal structures demonstrate that CaRTeD outperforms state-of-the-art baselines (e.g., COPA for tensor decomposition, DDBN for causal discovery) in both tensor recovery (CPI, SIM, RR metrics) and causal graph recovery (SHD, TPR, FDR metrics). Notably, the joint-learning approach yields more accurate and stable results, especially in low-sample or high-noise regimes.
Real-World EHR Application
Applied to the MIMIC-III EHR dataset, CaRTeD successfully extracts clinically meaningful phenotypes and infers plausible causal relationships among them. The inferred causal phenotype network aligns with established medical knowledge (e.g., hypertension → kidney disease, kidney disease → heart failure), and demonstrates improved explainability and interpretability compared to sequential or non-causal baselines.
Implications and Future Directions
Practical Implications
- Healthcare Analytics: The framework enables simultaneous discovery of disease phenotypes and their temporal causal relationships, supporting more nuanced risk modeling, intervention planning, and hypothesis generation in clinical research.
- Generalizability: While demonstrated on EHRs, the methodology is applicable to any domain with high-dimensional, irregularly sampled time series data (e.g., genomics, sensor networks, longitudinal social data).
- Explainability: By integrating causal constraints into the decomposition, the learned representations are more interpretable and actionable for downstream tasks.
Theoretical and Methodological Implications
- Convergence Analysis: The theoretical results provide a foundation for further development of non-convex, constrained tensor decomposition algorithms, particularly in settings with complex regularization.
- Extension to Heterogeneous and Nonlinear Models: The current framework assumes a single, time-invariant DBN structure and linear relationships. Future work could extend to time-varying, subgroup-specific, or nonlinear causal models (e.g., via Gaussian processes or neural networks).
- Handling Mixed Data Types and Confounding: Extensions to mixed continuous/discrete data and explicit modeling of hidden confounders are promising directions, especially for real-world biomedical applications.
Strong Numerical Results and Claims
- Superior Recovery: CaRTeD achieves higher similarity and recovery rates for tensor factors, and lower SHD and FDR for causal graphs, compared to state-of-the-art baselines, across a range of noise levels and sample sizes.
- Scalability and Robustness: The method is shown to be scalable to large, sparse datasets and robust to initialization, with warm-start strategies further improving performance.
- First Theoretical Guarantee: The paper provides, to the authors' knowledge, the first convergence guarantee for AO-ADMM-based irregular tensor decomposition with non-convex constraints.
Conclusion
CaRTeD represents a significant methodological advance in the joint modeling of temporal causal structures and high-dimensional, irregular data. Its integration of tensor decomposition and causal discovery, supported by strong empirical and theoretical results, opens new avenues for interpretable, data-driven analysis in healthcare and beyond. Future developments are likely to focus on relaxing model assumptions, incorporating richer data types, and scaling to even larger and more heterogeneous datasets.