Disentangling Hippocampal Shape Variations: A Study of Neurological Disorders Using Mesh Variational Autoencoder with Contrastive Learning

Published 31 Mar 2024 in cs.CV, cs.LG, and q-bio.NC | (2404.00785v3)

Abstract: This paper presents a comprehensive study focused on disentangling hippocampal shape variations from diffusion tensor imaging (DTI) datasets within the context of neurological disorders. Leveraging a Mesh Variational Autoencoder (VAE) enhanced with Supervised Contrastive Learning, our approach aims to improve interpretability by disentangling two distinct latent variables corresponding to age and the presence of diseases. In our ablation study, we investigate a range of VAE architectures and contrastive loss functions, showcasing the enhanced disentanglement capabilities of our approach. This evaluation uses synthetic 3D torus mesh data and real 3D hippocampal mesh datasets derived from the DTI hippocampal dataset. Our supervised disentanglement model outperforms several state-of-the-art (SOTA) methods like attribute and guided VAEs in terms of disentanglement scores. Our model distinguishes between age groups and disease status in patients with Multiple Sclerosis (MS) using the hippocampus data. Our Mesh VAE with Supervised Contrastive Learning shows the volume changes of the hippocampus of MS populations at different ages, and the result is consistent with the current neuroimaging literature. This research provides valuable insights into the relationship between neurological disorder and hippocampal shape changes in different age groups of MS populations using a Mesh VAE with Supervised Contrastive loss. Our code is available at https://github.com/Jakaria08/Explaining_Shape_Variability

Abstract PDF HTML Upgrade to Chat

References (35)

Summary

The paper introduces a novel graph VAE that integrates supervised contrastive losses to effectively disentangle age and disease influences on 3D hippocampal meshes.
It leverages SpiralNet++ for mesh convolutions and demonstrates superior performance using SAP scores and reconstruction accuracy on synthetic and real datasets.
The method provides clinically interpretable models by clearly separating continuous (age) and categorical (MS) factors, with potential for broader anatomical studies.

Disentangling Hippocampal Shape Variations with Graph VAE and Supervised Contrastive Learning

Introduction and Motivation

This study presents a method for disentangling anatomical variation of the hippocampus in neuroimaging data, targeting the separation of age and disease (Multiple Sclerosis, MS) factors using a graph-based variational autoencoder (VAE) framework augmented with supervised contrastive learning. The primary motivation is to enable interpretability and factor-specific shape analysis in 3D mesh data, where traditional image-based or point cloud techniques are insufficient to capture the nuances of anatomical morphology in medical contexts. The approach employs a unified supervised contrastive loss to improve disentanglement of latent representations corresponding to continuous (age) and categorical (disease status) labels, facilitating fine-grained, clinically interpretable generative modeling of 3D hippocampal surfaces.

Methodology

Model Architecture

The network utilizes a graph VAE in which both encoder and decoder are constructed atop the mesh convolutional operator SpiralNet++. A 3D mesh of the hippocampus (or synthetic shapes) is represented by vertices $X \in \mathbb{R}^{N \times 3}$ , which are input to the encoder $f_\phi$ , producing a latent code $z = [z_1, z_2, ..., z_{d_z}]$ . The decoder $f_\theta$ reconstructs the mesh from $z$ . The VAE backbone follows the standard $\beta$ -VAE formulation, incorporating both reconstruction and Kullback-Leibler divergence losses.

Figure 1: Overall architecture of the graph VAE with separate supervised contrastive losses for classification (disease/MS) and regression (age), with dedicated disentangled latent spaces for each factor.

Supervised Contrastive Loss

Contrary to prior approaches like Guided VAE and Attribute VAE, the proposed model integrates a supervised contrastive loss inspired by the Soft Nearest Neighbor Loss (SNNL) framework. Two loss terms are introduced: $L_{contr}^{cls}$ for class (MS vs. healthy) and $L_{contr}^{reg}$ for regression (age). The excitation-inhibition mechanism is encoded in the denominator of the loss, suppressing unintended correlations between non-target latent dimensions and the supervised factor, yielding enhanced selectivity and interpretability.

$L_{contr}^{cls}$ promotes proximity of $z_1$ among samples sharing the same binary label (e.g., MS status) and discourages proximity among others, while inhibiting correlations in other latent dimensions.
$L_{contr}^{reg}$ treats numerical labels in a contrastive regime using a proximity threshold, enforcing similarity in $z_2$ for samples with similar ages.

The entire loss combines VAE, classification, and regression terms in

$L_{contr} = L_{vae} + L_{contr}^{cls} + L_{contr}^{reg}$

Data Pipeline and Preprocessing

Two datasets were processed: a synthetic 3D torus mesh dataset (with ground truth variability factors) and a high-resolution DTI hippocampus dataset. Hippocampal segmentation relied on a combination of automated and manual annotation, postprocessed via marching cubes isosurface extraction, Laplacian smoothing, alignment, and topological point correspondence using Deformetrica. All meshes were resampled for consistent topology enabling mesh convolution.

Experimental Evaluation

Disentanglement and Generative Performance

A comprehensive evaluation was performed against baselines ( $\beta$ -VAE, $\beta$ -TCVAE) and SOTA methods (Supervised Guided VAE, Attribute VAE) on both synthetic and real data. The primary metric for disentanglement was the SAP score, supported by regression/classification accuracy on latent codes, Pearson/point-biserial correlation, and mean squared error (MSE).

Figure 2: Reconstructions of synthetic torus shapes; left: color codes the difference between target and reconstruction, right: decoder output as $z_1$ (bump) and $z_2$ (scale) are varied independently.

The proposed Supervised Contrastive VAE (SC VAE) achieved superior SAP scores, high classification/regression accuracy, and favorable MSE on both synthetic and hippocampus datasets, consistently outperforming baselines in latent factor separation.

Figure 3: Reconstructions of original and synthesized hippocampus meshes; left: reconstruction accuracy, right: reference hippocampus mesh.

Ablation Analysis

An ablation study established the impact of the inhibition mechanism: removing inhibition (i.e., $\lambda_2 = 0$ ) reduced SAP by 5–15%, confirming that explicit decorrelation of non-target latent variables is essential for optimal disentanglement.

Computational Considerations

The SC VAE, while requiring marginally longer training time than other SOTA supervised models (e.g., Attribute VAE), remains highly practical for 3D mesh data, with inference times suitable for batch deployment.

Disentangled Shape Analysis

The effectiveness of the model is demonstrated both quantitatively and qualitatively:

Figure 4: SAP disentanglement scores for classification and regression on synthetic torus (left) and hippocampus (right); higher bars indicate better latent selectivity.

Figure 5: Mesh visualizations of hippocampus volume change from healthy to MS across ages; yellow marks regions with highest atrophy.

Figure 6: Quantified hippocampal volume differences (healthy vs. MS) as a function of age for left and right hemispheres.

These results reveal nuanced, spatially localized atrophy patterns on the right hippocampus in MS subjects, in agreement with established neuroimaging findings. The generative model also captures progressive age-related volume loss.

Implications and Limitations

The supervised contrastive disentanglement paradigm for 3D meshes engineered in this work is directly transferable to other anatomical or morphological modeling problems where both continuous and categorical covariates drive structural variability. It offers an interpretable, scalable alternative to feed-forward classifier-based supervision, reducing architectural overhead and enhancing statistical specificity. This could facilitate, for example, factor-specific statistical atlases, conditional simulation for prognosis, or morphometric biomarker discovery.

Limitations stem from the need for mesh registration/topology normalization and the unavailability of longitudinal data. The capacity to fully generalize to population-level variability is thus constrained by sample size and cross-sectional design, particularly on the MS cohort. For greater clinical utility, integration with nonrigid deformation-insensitive feature extractors and training on larger longitudinal cohorts is advisable.

Conclusion

This paper advances the methodology for supervised disentanglement of 3D anatomical shape variation in neuroimaging, introducing a computationally efficient graph VAE architecture with a supervised contrastive loss that enables robust, interpretable separation of clinical (disease) and demographic (age) factors in the latent space. The model yields state-of-the-art SAP disentanglement and supports conditional generative exploration of anatomical outcomes, uncovering both expected and previously reported patterns of hippocampal atrophy in MS as a function of age. Future developments should target topology-agnostic mesh processing, scaling to richer multimodal clinical datasets, and further optimization of supervised/excitation-inhibition strategies for interpretable representation learning.

Markdown Report Issue