KGCM-VAE: Knowledge-Guided Causal VAE
- The paper introduces KGCM-VAE, a model that estimates individualized treatment effects using a velocity modulation scheme to incorporate physical constraints.
- It balances latent representations with Maximum Mean Discrepancy and enforces causal structure via an adjacency-constrained decoder reflecting domain knowledge.
- Experimental results on Arctic sea ice data show improved causal effect estimation and forecasting performance compared to baseline models.
The Knowledge-Guided Causal Model Variational Autoencoder (KGCM-VAE) is a sequential variational inference framework developed for quantifying causal effects in time-varying systems with strong physical constraints and latent confounding. Designed for applications such as Arctic sea ice dynamics, KGCM-VAE implements a velocity modulation scheme to create physically grounded causal treatments, employs Maximum Mean Discrepancy (MMD) for latent space balancing, and enforces a causal adjacency-constrained decoder reflecting domain knowledge. This results in robust estimation of individualized treatment effects (ITE) and improved causal mechanism discovery in spatiotemporal data settings (Sampath et al., 25 Jan 2026).
1. Formal Structure and Model Architecture
KGCM-VAE models the temporal evolution of observed, latent, and treatment variables at each time step :
- : observed covariates (e.g., sea ice thickness, SSH, velocity components).
- : latent system state encoding unobserved confounders.
- : time-varying, physically constructed treatment vector.
The generative model is factorized as: with the following parameterizations:
The inference (encoder) step is realized by: which is implemented as a GRU-based Gaussian encoder.
2. Velocity Modulation and Treatment Construction
KGCM-VAE introduces a knowledge-guided, physically interpretable mechanism for treatment assignment. The causal treatment is constructed from the sea surface height (SSH) as follows:
- Compute the SSH increment:
- Apply temporal smoothing (e.g., moving average) to obtain velocity signal:
- Apply sigmoid gating to modulate velocity according to SSH transitions:
where (steepness) and (bias/threshold) are hyperparameters.
- Set the treatment vector:
ensuring that only high-magnitude, physically relevant SSH changes generate significant “intervention” signals.
This scheme encodes both domain knowledge and ensures that causal inference respects physically plausible interventions in ocean–ice dynamics.
3. Latent-Space Confounding Control via Maximum Mean Discrepancy
To mitigate bias from latent confounding, KGCM-VAE regularizes the latent variables so that treated () and control () encodings are balanced. Denote , . The squared maximum mean discrepancy is computed as: where is the implicit feature map induced by, e.g., a radial basis function (RBF) kernel. Empirical estimation of MMD is used to penalize distributional discrepancies, promoting covariate balance in the latent space.
4. Causal Adjacency-Constrained Decoder
KGCM-VAE’s decoder contains an internal adjacency mask , thresholded from learnable logits, that enforces alignment with a fixed physical adjacency matrix . This physical matrix encodes:
- Contemporaneous causal edges (e.g., ).
- Lagged causal edges (e.g., ).
- Additional domain-informed pathways (e.g., SSH/velocity to ice thickness).
In decoding, each GRU unit receives as input only the “allowed” features: A Frobenius-norm penalty is applied to ensure that the learned adjacency remains close to the established physical structure.
5. Objective Function and Training
The KGCM-VAE objective consists of three terms:
- : coefficient for latent balancing.
- : coefficient enforcing physical adjacency.
- Latent dimension is set to 32.
Training employs a one-layer Bi-GRU encoder over input windows, uni-directional GRU decoder with masked inputs, Adam optimizer (learning rate ), batch size 64, 100 training epochs, ReLU activations, and a kernel bandwidth of 1.0 for the MMD term.
6. Experimental Protocol and Evaluation
Experiments are conducted on both synthetic and real-world Arctic datasets:
- Synthetic counterfactuals:
with parameters controlling treatment nonlinearity and centrality.
- Real data: ECMWF S2S reanalysis (Jan 2020–Jun 2024), daily averages over 60°–90°N, comprising sea ice thickness, SSH, and velocity components.
Evaluation metrics include:
| Model | Test RMSE | Test PEHE |
|---|---|---|
| KGCM-VAE | 0.3225 | 3.8159 |
| R-CRN | 0.2034 | 3.8567 |
| CF-RNN | 0.2280 | 3.8599 |
| Causal-TaRNet | 0.2008 | 3.8920 |
Ablation studies reveal that combining both MMD and adjacency constraints produces a 1.88% reduction in PEHE over the backbone. The optimal KGCM-VAE configuration provides a 1.06% PEHE improvement over R-CRN and a 1.95% gain over Causal-TaRNet, representing a deliberate preference for causal fidelity versus marginal improvements in forecasting accuracy (Sampath et al., 25 Jan 2026).
7. Implications, Limitations, and Future Directions
KGCM-VAE demonstrates that physically grounded, knowledge-guided causal inference in deep generative models can yield superior treatment effect estimation in domains with strong confounding and structural constraints. The integration of velocity modulation, latent space balancing, and physically-constrained decoding enforces faithful causality, mitigates spurious correlation, and provides interpretable, actionable representations.
Several limitations persist:
- Extension to high-dimensional gridded spatial fields remains open.
- Better adaptation for kernel bandwidth selection in MMD is required.
- Automated discovery of the physical adjacency from data is not addressed.
A plausible implication is that the knowledge-guided modeling principle underlying KGCM-VAE is transferrable to other domains where domain-informed causal structures must be imposed within deep generative frameworks (Sampath et al., 25 Jan 2026).