Deep Structural Causal Models for Meshes
- Deep Structural Causal Models for meshes are frameworks that combine geometric deep learning with causal modeling to enable interventional and counterfactual analysis of 3D shapes.
- They employ spectral convolutions, normalizing flows, and variational inference to capture complex anatomical and multimodal features within mesh data.
- Recent extensions integrate multiscale and multimodal approaches to disentangle causal from non-causal factors, thereby enhancing interpretability and predictive accuracy.
Deep Structural Causal Models (DSCMs) for meshes integrate geometric deep learning with structural causal modeling to enable interventional and counterfactual inference on 3D mesh data. These methods allow for explicit causal interpretation and manipulation of high-dimensional shape representations, addressing questions about the effects of specific factors (e.g., age, sex, genetics) on anatomic structures. Recent developments extend DSCMs to incorporate multiscale and multimodal data and to disentangle causal from non-causal latent factors in mesh-based representations (Rasal et al., 2022, Xia et al., 12 Dec 2025).
1. Structural Causal Model Specification for Meshes
The DSCM framework for 3D meshes is formally defined by an explicit causal graph, nodes, and corresponding structural equations. In applications to neuroanatomical mesh data, typical nodes include demographic and morphological covariates (age , sex , total brain volume , substructure volume ), the mesh itself (), and independent exogenous noise variables . The directed acyclic graph (DAG) specifies relationships such as , , and ; all noise terms are assumed mutually independent (Rasal et al., 2022). The structural equations take the form
where , , are low-dimensional conditional normalizing flows, and is a high-dimensional mapping implemented via spectral graph convolutional variational autoencoders (VAE). For multimodal and multiscale data, the causal graph is extended to include modality-specific (e.g., cortical mesh and connectome) latent factors (, ), as well as downstream phenotypes (), with structural equations that model modality interactions and potential interventions (Xia et al., 12 Dec 2025).
2. Spectral Geometric Parameterization and Mesh Operators
High-fidelity shape modeling in DSCMs relies on geometric deep learning modules that respect the underlying graph structure of meshes. For surface meshes, each shape is represented as , where denotes the number of vertices. Graph Laplacians ( for mesh; for connectome) are constructed from mesh connectivity or multimodal structural adjacency matrices. Chebyshev spectral convolutions, as introduced by Defferrard et al., are the primary operator for feature extraction, with order polynomials: where are Chebyshev polynomials and . For multimodal integration, Laplacian harmonics and spectral graph attention modules extract aligned representations, fusing mesh and connectome information into a shared latent space (Xia et al., 12 Dec 2025).
Pooling and unpooling on meshes use quadric-error contraction (Simplify), enabling scalable encoding and decoding. Global Procrustes/Kabsch–Umeyama alignment is standard for pose invariance during preprocessing.
3. Inference, Variational Learning, and Causal Latent Disentanglement
Amortized variational inference is employed to approximate posteriors over high-dimensional latent codes. The encoder leverages spectral graph convolutions and mesh pooling to map from mesh space to a multivariate Gaussian latent, while the decoder inverts this mapping. The training objective maximizes the evidence lower bound (ELBO), which takes the form: with likelihood terms and KL divergence to standard Gaussian priors (Rasal et al., 2022).
For multiscale causal modeling, encoders and disentanglement modules achieve separation of shared versus scale-specific factors. The joint latent supports explicit regularization: cross-modal similarity, orthogonality, and contrastive losses ensure that captures commonality, whereas captures modality- or scale-unique features. Bilevel mutual information regularization further separates causal from non-causal factors. Causal mutual information between and downstream target is maximized, while non-causal MI is minimized and cross-modal causal MI is encouraged (Xia et al., 12 Dec 2025).
4. Interventions and Counterfactuals in Mesh Space
DSCMs operationalize Pearl’s three-step process—abduction, action, and prediction—for individual mesh instances. Abduction infers the noise (exogenous) variables from observed data. Action applies interventions, e.g., , by modifying parent node values. Prediction propagates these changes through the structural equations, yielding counterfactual mesh outcomes (e.g., simulating how brain morphology would change at a different age or under different covariate settings). Population-level interventions sample noise from the prior and propagate through the graph, while subject-specific counterfactuals use instance-specific inferred noise (Rasal et al., 2022). For multimodal or scale-specific counterfactuals, interventions can target specific causal latent components, supporting detailed hypothesis testing (Xia et al., 12 Dec 2025).
5. Implementation and Optimization Details
Mesh DSCMs encode meshes as (for brain stem; fixed topology), use ELU activations, and vary latent dimension in ablation studies (e.g., ). The CondEnc encoder includes stacks of ChebBlocks with interleaved Simplify pooling, while the decoder mirrors the encoder with upsampling. Training uses the Adam optimizer, with distinct learning rates for covariate flows () and mesh CVAEs (). Stochastic variational inference (SVI) in Pyro supports efficient optimization, with one Monte Carlo sample per example. Data splits of 10,441 (train), 1,160 (validation), and 2,901 (test) were reported (Rasal et al., 2022). For La-MuSe, optimization incorporates both inner-loop (latent) and outer-loop (parameter) objectives with bilevel unrolling, with a batch size of 8 and 300 epochs (Xia et al., 12 Dec 2025). Mesh preprocessing and registration, Laplacian building, and multi-scale alignment are prerequisites for both mono- and multimodal variants.
6. Experimental Evaluation: Datasets, Metrics, and Ablation
Evaluation leverages large MRI-derived datasets, such as 14,502 T1-weighted MRI subjects from the UK Biobank, extracting subcortical meshes registered to templates. Covariate ranges include age (40–70), binary sex, and volumetric measurements (Rasal et al., 2022). Quantitative metrics for mesh reconstruction include vertex Euclidean distance (VED), specificity under interventions, and compactness (e.g., explained variance ratios derived from PCA on reconstructions and counterfactuals). For downstream prediction, mean absolute error (MAE), root-mean-square error (RMSE), accuracy (ACC), and F1 score are used with 5-fold cross-validation (Xia et al., 12 Dec 2025).
Ablation studies confirm that spectral alignment, disentangled GVAE architecture, and mutual information-driven regularization are necessary for optimal performance. For example, removing Laplacian harmonics increases MAE by 0.56 and reduces ACC by 0.14; removing disentanglement or causal loss similarly degrades performance (Xia et al., 12 Dec 2025). Qualitative results include visualizations of high-fidelity reconstructions, population-level mesh sampling, targeted interventions, and subject-specific counterfactual trajectories, demonstrating preservation of exogenous noise and recovery of original meshes within sub-millimeter error.
7. Extensions to Multimodal, Multiscale, and Causally-Structured Mesh Models
Recent advances generalize DSCMs to crossover domains combining different mesh modalities and spatial/functional scales. For example, the La-MuSe framework incorporates both cortical mesh geometry and connectomic adjacency, combining these with spectral operators to generate multi-scale shared representations. Disentangled latent variables assigned to specific scales or modalities are regularized to achieve interpretability and control, essential for robust neuroscientific inference and phenotype prediction. Bilevel mutual information-regularized objectives ensure that only causally relevant factors drive predictions, while counterfactual interventions retain semantic meaning in the mesh domain (Xia et al., 12 Dec 2025).
A plausible implication is that these frameworks, by enabling direct interventional and counterfactual queries on physical shape data, are positioned to supplant purely associational statistical shape modeling approaches, especially in medical imaging and neuroscientific research. Future directions include extension to additional modalities, incorporating richer covariate graphs, and optimizing scalability for population-level studies.
| Model | Spectral Operators | Disentanglement | Causal Interventions |
|---|---|---|---|
| CSM (Rasal et al., 2022) | Chebyshev conv, Laplacian | Not explicit | Action/abduction |
| La-MuSe (Xia et al., 12 Dec 2025) | Laplacian harmonics, attention | Multi-scale, MI-loss | Multimodal/latent |
Both frameworks enable shape-based causal modeling on 3D meshes, with La-MuSe providing enhanced multiscale, modular, and MI-regularized extensions for interpretability and multi-modal integration.