KGCM-VAE: Knowledge-Guided Causal VAE

Updated 1 February 2026

The paper introduces KGCM-VAE, a model that estimates individualized treatment effects using a velocity modulation scheme to incorporate physical constraints.
It balances latent representations with Maximum Mean Discrepancy and enforces causal structure via an adjacency-constrained decoder reflecting domain knowledge.
Experimental results on Arctic sea ice data show improved causal effect estimation and forecasting performance compared to baseline models.

The Knowledge-Guided Causal Model Variational Autoencoder (KGCM-VAE) is a sequential variational inference framework developed for quantifying causal effects in time-varying systems with strong physical constraints and latent confounding. Designed for applications such as Arctic sea ice dynamics, KGCM-VAE implements a velocity modulation scheme to create physically grounded causal treatments, employs Maximum Mean Discrepancy (MMD) for latent space balancing, and enforces a causal adjacency-constrained decoder reflecting domain knowledge. This results in robust estimation of individualized treatment effects (ITE) and improved causal mechanism discovery in spatiotemporal data settings (Sampath et al., 25 Jan 2026).

1. Formal Structure and Model Architecture

KGCM-VAE models the temporal evolution of observed, latent, and treatment variables at each time step $t$ :

$x_t \in \mathbb{R}^p$ : observed covariates (e.g., sea ice thickness, SSH, velocity components).
$z_t \in \mathbb{R}^{d_z}$ : latent system state encoding unobserved confounders.
$v_t \in \mathbb{R}^p$ : time-varying, physically constructed treatment vector.

The generative model is factorized as: $p_\theta(x_t, z_t \mid z_{t-1}, v_t) = p_\theta(z_t \mid z_{t-1}, v_t)\, p_\theta(x_t \mid z_t)$ with the following parameterizations:

$p_\theta(z_t \mid z_{t-1}, v_t)$ : a GRU cell maps $(z_{t-1}, v_t)$ to a Gaussian prior for $z_t$ .
$p_\theta(x_t \mid z_t)$ : an MLP decoder reconstructs $x_t$ from $z_t$ .

The inference (encoder) step is realized by: $q_\phi(z_t \mid x_t, z_{t-1}, v_t)$ which is implemented as a GRU-based Gaussian encoder.

2. Velocity Modulation and Treatment Construction

KGCM-VAE introduces a knowledge-guided, physically interpretable mechanism for treatment assignment. The causal treatment $v_t$ is constructed from the sea surface height (SSH) as follows:

Compute the SSH increment:

$\Delta \mathrm{SSH}_t = \mathrm{SSH}_t - \mathrm{SSH}_{t-1}$

Apply temporal smoothing (e.g., moving average) to obtain velocity signal:

$V_t = \mathrm{Smooth}(\Delta \mathrm{SSH}_t)$

Apply sigmoid gating to modulate velocity according to SSH transitions:

$\sigma_t = \frac{1}{1 + \exp[-\alpha \Delta \mathrm{SSH}_t - b]}$

where $\alpha$ (steepness) and $b$ (bias/threshold) are hyperparameters.

Set the treatment vector:

$v_t = \sigma_t \odot V_t$

ensuring that only high-magnitude, physically relevant SSH changes generate significant “intervention” signals.

This scheme encodes both domain knowledge and ensures that causal inference respects physically plausible interventions in ocean–ice dynamics.

3. Latent-Space Confounding Control via Maximum Mean Discrepancy

To mitigate bias from latent confounding, KGCM-VAE regularizes the latent variables so that treated ( $v=1$ ) and control ( $v=0$ ) encodings are balanced. Denote $P_{\mathrm{tr}} = q_\phi(z \mid X, v=1)$ , $P_{\mathrm{ctr}} = q_\phi(z \mid X, v=0)$ . The squared maximum mean discrepancy is computed as: $\mathrm{MMD}^2(P_{\mathrm{ctr}}, P_{\mathrm{tr}}) = \left\| \mathbb{E}_{z \sim P_{\mathrm{tr}}}[\phi(z)] - \mathbb{E}_{z \sim P_{\mathrm{ctr}}}[\phi(z)] \right\|^2_{\mathcal H}$ where $\phi$ is the implicit feature map induced by, e.g., a radial basis function (RBF) kernel. Empirical estimation of MMD is used to penalize distributional discrepancies, promoting covariate balance in the latent space.

4. Causal Adjacency-Constrained Decoder

KGCM-VAE’s decoder contains an internal adjacency mask $A \in \{0,1\}^{p \times p}$ , thresholded from learnable logits, that enforces alignment with a fixed physical adjacency matrix $A_{\mathrm{phys}}$ . This physical matrix encodes:

Contemporaneous causal edges (e.g., $T_t \to Y_{t+1}$ ).
Lagged causal edges (e.g., $T_{t-\ell} \to Y_{t+1}$ ).
Additional domain-informed pathways (e.g., SSH/velocity to ice thickness).

In decoding, each GRU unit $i$ receives as input only the “allowed” features: $\mathrm{Input}_i = [A_{i, \cdot} \odot x_{t-1},\, z_t]$ A Frobenius-norm penalty $\lambda_A \|A - A_{\mathrm{phys}}\|_F^2$ is applied to ensure that the learned adjacency remains close to the established physical structure.

5. Objective Function and Training

The KGCM-VAE objective consists of three terms: $\mathcal{L}(\theta, \phi) = \sum_{t=1}^T \left\{ \mathbb{E}_{q_\phi(z_t\mid\cdot)}[\log p_\theta(x_t \mid z_t)] - \mathrm{KL}(q_\phi(z_t \mid x_t, z_{t-1}, v_t) \| p_\theta(z_t \mid z_{t-1}, v_t)) \right\} - \lambda_{\mathrm{MMD}}\, \mathrm{MMD}^2 + \lambda_A\, \|A - A_{\mathrm{phys}}\|_F^2$

$\lambda_{\mathrm{MMD}}$ : coefficient for latent balancing.
$\lambda_A$ : coefficient enforcing physical adjacency.
Latent dimension $d_z$ is set to 32.

Training employs a one-layer Bi-GRU encoder over input windows, uni-directional GRU decoder with masked inputs, Adam optimizer (learning rate $10^{-3}$ ), batch size 64, 100 training epochs, ReLU activations, and a kernel bandwidth of 1.0 for the MMD term.

6. Experimental Protocol and Evaluation

Experiments are conducted on both synthetic and real-world Arctic datasets:

Synthetic counterfactuals:

$Y_{1,t} = Y_{0,t} + \beta \tanh[\alpha (T_{1,t} - T_{0,t} - \mu_T)] + \epsilon$

with parameters controlling treatment nonlinearity and centrality.

Real data: ECMWF S2S reanalysis (Jan 2020–Jun 2024), daily averages over 60°–90°N, comprising sea ice thickness, SSH, and velocity components.

Evaluation metrics include:

Model	Test RMSE	Test PEHE
KGCM-VAE	0.3225	3.8159
R-CRN	0.2034	3.8567
CF-RNN	0.2280	3.8599
Causal-TaRNet	0.2008	3.8920

Ablation studies reveal that combining both MMD and adjacency constraints produces a 1.88% reduction in PEHE over the backbone. The optimal KGCM-VAE configuration provides a 1.06% PEHE improvement over R-CRN and a 1.95% gain over Causal-TaRNet, representing a deliberate preference for causal fidelity versus marginal improvements in forecasting accuracy (Sampath et al., 25 Jan 2026).

7. Implications, Limitations, and Future Directions

KGCM-VAE demonstrates that physically grounded, knowledge-guided causal inference in deep generative models can yield superior treatment effect estimation in domains with strong confounding and structural constraints. The integration of velocity modulation, latent space balancing, and physically-constrained decoding enforces faithful causality, mitigates spurious correlation, and provides interpretable, actionable representations.

Several limitations persist:

Extension to high-dimensional gridded spatial fields remains open.
Better adaptation for kernel bandwidth selection in MMD is required.
Automated discovery of the physical adjacency $A_{\mathrm{phys}}$ from data is not addressed.

A plausible implication is that the knowledge-guided modeling principle underlying KGCM-VAE is transferrable to other domains where domain-informed causal structures must be imposed within deep generative frameworks (Sampath et al., 25 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Time-Varying Causal Treatment for Quantifying the Causal Effect of Short-Term Variations on Arctic Sea Ice Dynamics (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Knowledge-Guided Causal Model Variational Autoencoder (KGCM-VAE).