Deep Structural Causal Models for Meshes

Updated 30 December 2025

Deep Structural Causal Models for meshes are frameworks that combine geometric deep learning with causal modeling to enable interventional and counterfactual analysis of 3D shapes.
They employ spectral convolutions, normalizing flows, and variational inference to capture complex anatomical and multimodal features within mesh data.
Recent extensions integrate multiscale and multimodal approaches to disentangle causal from non-causal factors, thereby enhancing interpretability and predictive accuracy.

Deep Structural Causal Models (DSCMs) for meshes integrate geometric deep learning with structural causal modeling to enable interventional and counterfactual inference on 3D mesh data. These methods allow for explicit causal interpretation and manipulation of high-dimensional shape representations, addressing questions about the effects of specific factors (e.g., age, sex, genetics) on anatomic structures. Recent developments extend DSCMs to incorporate multiscale and multimodal data and to disentangle causal from non-causal latent factors in mesh-based representations (Rasal et al., 2022, Xia et al., 12 Dec 2025).

1. Structural Causal Model Specification for Meshes

The DSCM framework for 3D meshes is formally defined by an explicit causal graph, nodes, and corresponding structural equations. In applications to neuroanatomical mesh data, typical nodes include demographic and morphological covariates (age $a$ , sex $s$ , total brain volume $b$ , substructure volume $v$ ), the mesh itself ( $x$ ), and independent exogenous noise variables $\varepsilon$ . The directed acyclic graph (DAG) specifies relationships such as $a \rightarrow b \leftarrow s$ , $a \rightarrow v \leftarrow b$ , and $(b, v) \rightarrow x$ ; all noise terms are assumed mutually independent (Rasal et al., 2022). The structural equations take the form

$\begin{align*} \varepsilon_A &\sim \mathcal{N}(0, 1), \qquad a := f_A(\varepsilon_A)\ \varepsilon_S &\sim \mathrm{Bernoulli}(\theta),\qquad s := f_S(\varepsilon_S)\ \varepsilon_B &\sim \mathcal{N}(0, 1),\qquad b := f_B(\varepsilon_B; a, s)\ \varepsilon_V &\sim \mathcal{N}(0, 1),\qquad v := f_V(\varepsilon_V; a, b)\ \varepsilon_X &= (u_X, z_X), \quad u_X \sim \mathcal{N}(0, I_{3|V|}),\quad z_X\sim \mathcal{N}(0, I_D)\ x &:= f_X(\varepsilon_X; v, b) \end{align*}$

where $f_A$ , $f_B$ , $f_V$ are low-dimensional conditional normalizing flows, and $f_X$ is a high-dimensional mapping implemented via spectral graph convolutional variational autoencoders (VAE). For multimodal and multiscale data, the causal graph is extended to include modality-specific (e.g., cortical mesh and connectome) latent factors ( $z_{\mathrm{causal}}^{(m)}$ , $z_{\mathrm{causal}}^{(c)}$ ), as well as downstream phenotypes ( $Y$ ), with structural equations that model modality interactions and potential interventions (Xia et al., 12 Dec 2025).

2. Spectral Geometric Parameterization and Mesh Operators

High-fidelity shape modeling in DSCMs relies on geometric deep learning modules that respect the underlying graph structure of meshes. For surface meshes, each shape is represented as $x\in\mathbb{R}^{|V|\times 3}$ , where $|V|$ denotes the number of vertices. Graph Laplacians ( $L_m$ for mesh; $L_c$ for connectome) are constructed from mesh connectivity or multimodal structural adjacency matrices. Chebyshev spectral convolutions, as introduced by Defferrard et al., are the primary operator for feature extraction, with $K\ll |V|$ order polynomials: $y_i = \sum_{j=1}^{F_{in}} \sum_{k=0}^{K-1} \theta_{j,i,k} T_k(\tilde{L}) x_j$ where $T_k$ are Chebyshev polynomials and $\tilde{L}=2L/\lambda_{max}-I$ . For multimodal integration, Laplacian harmonics and spectral graph attention modules extract aligned representations, fusing mesh and connectome information into a shared latent space (Xia et al., 12 Dec 2025).

Pooling and unpooling on meshes use quadric-error contraction (Simplify), enabling scalable encoding and decoding. Global Procrustes/Kabsch–Umeyama alignment is standard for pose invariance during preprocessing.

3. Inference, Variational Learning, and Causal Latent Disentanglement

Amortized variational inference is employed to approximate posteriors over high-dimensional latent codes. The encoder $q_\psi(z_X|x,b,v)$ leverages spectral graph convolutions and mesh pooling to map from mesh space to a multivariate Gaussian latent, while the decoder inverts this mapping. The training objective maximizes the evidence lower bound (ELBO), which takes the form: $\mathrm{ELBO} = \mathbb{E}_{q} [ \log p(x|z_X,b,v)] - \mathrm{KL}[q(z_X|x,b,v)\|p(z_X)]$ with likelihood terms and KL divergence to standard Gaussian priors (Rasal et al., 2022).

For multiscale causal modeling, encoders and disentanglement modules achieve separation of shared versus scale-specific factors. The joint latent $z = [z_{\mathrm{shared}}, z_{\mathrm{scale}_1}, ..., z_{\mathrm{scale}_K}]$ supports explicit regularization: cross-modal similarity, orthogonality, and contrastive losses ensure that $z_{\mathrm{shared}}$ captures commonality, whereas $z_{\mathrm{scale}_i}$ captures modality- or scale-unique features. Bilevel mutual information regularization further separates causal from non-causal factors. Causal mutual information between $z_{\mathrm{causal}}$ and downstream target $Y$ is maximized, while non-causal MI is minimized and cross-modal causal MI is encouraged (Xia et al., 12 Dec 2025).

4. Interventions and Counterfactuals in Mesh Space

DSCMs operationalize Pearl’s three-step process—abduction, action, and prediction—for individual mesh instances. Abduction infers the noise (exogenous) variables from observed data. Action applies interventions, e.g., $\mathrm{do}(a \leftarrow a')$ , by modifying parent node values. Prediction propagates these changes through the structural equations, yielding counterfactual mesh outcomes (e.g., simulating how brain morphology would change at a different age or under different covariate settings). Population-level interventions sample noise from the prior and propagate through the graph, while subject-specific counterfactuals use instance-specific inferred noise (Rasal et al., 2022). For multimodal or scale-specific counterfactuals, interventions can target specific causal latent components, supporting detailed hypothesis testing (Xia et al., 12 Dec 2025).

5. Implementation and Optimization Details

Mesh DSCMs encode meshes as $x \in \mathbb{R}^{642 \times 3}$ (for brain stem; fixed topology), use ELU activations, and vary latent dimension $D$ in ablation studies (e.g., $D=32$ ). The CondEnc encoder includes stacks of ChebBlocks with interleaved Simplify pooling, while the decoder mirrors the encoder with upsampling. Training uses the Adam optimizer, with distinct learning rates for covariate flows ( $10^{-3}$ ) and mesh CVAEs ( $10^{-4}$ ). Stochastic variational inference (SVI) in Pyro supports efficient optimization, with one Monte Carlo sample per example. Data splits of 10,441 (train), 1,160 (validation), and 2,901 (test) were reported (Rasal et al., 2022). For La-MuSe, optimization incorporates both inner-loop (latent) and outer-loop (parameter) objectives with bilevel unrolling, with a batch size of 8 and 300 epochs (Xia et al., 12 Dec 2025). Mesh preprocessing and registration, Laplacian building, and multi-scale alignment are prerequisites for both mono- and multimodal variants.

6. Experimental Evaluation: Datasets, Metrics, and Ablation

Evaluation leverages large MRI-derived datasets, such as 14,502 T1-weighted MRI subjects from the UK Biobank, extracting subcortical meshes registered to templates. Covariate ranges include age (40–70), binary sex, and volumetric measurements (Rasal et al., 2022). Quantitative metrics for mesh reconstruction include vertex Euclidean distance (VED), specificity under interventions, and compactness (e.g., explained variance ratios derived from PCA on reconstructions and counterfactuals). For downstream prediction, mean absolute error (MAE), root-mean-square error (RMSE), accuracy (ACC), and F1 score are used with 5-fold cross-validation (Xia et al., 12 Dec 2025).

Ablation studies confirm that spectral alignment, disentangled GVAE architecture, and mutual information-driven regularization are necessary for optimal performance. For example, removing Laplacian harmonics increases MAE by 0.56 and reduces ACC by 0.14; removing disentanglement or causal loss similarly degrades performance (Xia et al., 12 Dec 2025). Qualitative results include visualizations of high-fidelity reconstructions, population-level mesh sampling, targeted interventions, and subject-specific counterfactual trajectories, demonstrating preservation of exogenous noise and recovery of original meshes within sub-millimeter error.

7. Extensions to Multimodal, Multiscale, and Causally-Structured Mesh Models

Recent advances generalize DSCMs to crossover domains combining different mesh modalities and spatial/functional scales. For example, the La-MuSe framework incorporates both cortical mesh geometry and connectomic adjacency, combining these with spectral operators to generate multi-scale shared representations. Disentangled latent variables assigned to specific scales or modalities are regularized to achieve interpretability and control, essential for robust neuroscientific inference and phenotype prediction. Bilevel mutual information-regularized objectives ensure that only causally relevant factors drive predictions, while counterfactual interventions retain semantic meaning in the mesh domain (Xia et al., 12 Dec 2025).

A plausible implication is that these frameworks, by enabling direct interventional and counterfactual queries on physical shape data, are positioned to supplant purely associational statistical shape modeling approaches, especially in medical imaging and neuroscientific research. Future directions include extension to additional modalities, incorporating richer covariate graphs, and optimizing scalability for population-level studies.

Model	Spectral Operators	Disentanglement	Causal Interventions
CSM (Rasal et al., 2022)	Chebyshev conv, Laplacian	Not explicit	Action/abduction
La-MuSe (Xia et al., 12 Dec 2025)	Laplacian harmonics, attention	Multi-scale, MI-loss	Multimodal/latent

Both frameworks enable shape-based causal modeling on 3D meshes, with La-MuSe providing enhanced multiscale, modular, and MI-regularized extensions for interpretability and multi-modal integration.

Markdown Report Issue Upgrade to Chat

References (2)

Deep Structural Causal Shape Models (2022)

Multiscale Causal Geometric Deep Learning for Modeling Brain Structure (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Deep Structural Causal Models for Meshes.