Papers
Topics
Authors
Recent
Search
2000 character limit reached

Disentangling Hippocampal Shape Variations: A Study of Neurological Disorders Using Mesh Variational Autoencoder with Contrastive Learning

Published 31 Mar 2024 in cs.CV, cs.LG, and q-bio.NC | (2404.00785v3)

Abstract: This paper presents a comprehensive study focused on disentangling hippocampal shape variations from diffusion tensor imaging (DTI) datasets within the context of neurological disorders. Leveraging a Mesh Variational Autoencoder (VAE) enhanced with Supervised Contrastive Learning, our approach aims to improve interpretability by disentangling two distinct latent variables corresponding to age and the presence of diseases. In our ablation study, we investigate a range of VAE architectures and contrastive loss functions, showcasing the enhanced disentanglement capabilities of our approach. This evaluation uses synthetic 3D torus mesh data and real 3D hippocampal mesh datasets derived from the DTI hippocampal dataset. Our supervised disentanglement model outperforms several state-of-the-art (SOTA) methods like attribute and guided VAEs in terms of disentanglement scores. Our model distinguishes between age groups and disease status in patients with Multiple Sclerosis (MS) using the hippocampus data. Our Mesh VAE with Supervised Contrastive Learning shows the volume changes of the hippocampus of MS populations at different ages, and the result is consistent with the current neuroimaging literature. This research provides valuable insights into the relationship between neurological disorder and hippocampal shape changes in different age groups of MS populations using a Mesh VAE with Supervised Contrastive loss. Our code is available at https://github.com/Jakaria08/Explaining_Shape_Variability

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 2623–2631, 2019.
  2. Going deep in medical image analysis: concepts, methods, challenges, and future directions. IEEE Access, 7:99540–99572, 2019.
  3. A contrastive learning approach for training variational autoencoder priors. Advances in neural information processing systems, 34:480–493, 2021.
  4. James Dean Brown. Point-biserial correlation coefficients. Statistics, 5(3):12–6, 2001.
  5. Understanding disentangling in b⁢e⁢t⁢a𝑏𝑒𝑡𝑎betaitalic_b italic_e italic_t italic_a-vae. arXiv preprint arXiv:1804.03599, 2018.
  6. Attri-vae: Attribute-based interpretable representations of medical images with variational autoencoders. Computerized Medical Imaging and Graphics, 104:102158, 2023.
  7. Isolating sources of disentanglement in variational autoencoders. Advances in neural information processing systems, 31, 2018.
  8. The qt interval in patients with covid-19 treated with hydroxychloroquine and azithromycin. Nature medicine, 26(6):808–809, 2020.
  9. Pearson correlation coefficient. Noise reduction in speech processing, pages 1–4, 2009.
  10. Disentangled and controllable face image generation via 3d imitative-contrastive learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5154–5163, 2020.
  11. Guided variational autoencoder for disentanglement learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7920–7929, 2020.
  12. Emilien Dupont. Learning disentangled joint continuous and discrete representations. Advances in neural information processing systems, 31, 2018.
  13. Morphometry of anatomical shape complexes with dense deformations and sparse parameters. NeuroImage, 101:35–49, 2014.
  14. Hippocampus segmentation on high resolution diffusion mri. In 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), pages 1369–1372. IEEE, 2021.
  15. Dava: Disentangling adversarial variational autoencoder. arXiv preprint arXiv:2303.01384, 2023.
  16. 3d shape variational autoencoder latent disentanglement via mini-batch feature swapping for bodies and faces. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18730–18739, 2022.
  17. Analyzing and improving representations with the soft nearest neighbor loss. In International conference on machine learning, pages 2012–2020. PMLR, 2019.
  18. Spiralnet++: A fast and highly efficient mesh convolution operator. In Proceedings of the IEEE/CVF international conference on computer vision workshops, pages 0–0, 2019.
  19. beta-vae: Learning basic visual concepts with a constrained variational framework. In International conference on learning representations, 2016.
  20. Contrastive masked autoencoders are stronger vision learners. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  21. Memory impairment in multiple sclerosis: relevance of hippocampal activation and hippocampal connectivity. Multiple Sclerosis Journal, 21(13):1705–1712, 2015.
  22. Application of k-nearest neighbor (knn) approach for predicting economic events: Theoretical background. International journal of engineering research and applications, 3(5):605–610, 2013.
  23. Explaining anatomical shape variability: Supervised disentangling with a variational graph autoencoder. In 2023 IEEE 20th International Symposium on Biomedical Imaging (ISBI), pages 1–5. IEEE, 2023.
  24. Disentangling by factorising. In International Conference on Machine Learning, pages 2649–2658. PMLR, 2018.
  25. An introduction to variational autoencoders. Foundations and Trends® in Machine Learning, 12(4):307–392, 2019.
  26. Variational inference of disentangled latent concepts from unlabeled observations. arXiv preprint arXiv:1711.00848, 2017.
  27. Marching cubes: A high resolution 3d surface construction algorithm. In Seminal graphics: pioneering efforts that shaped the field, pages 347–353. 1998.
  28. Voxel structure-based mesh reconstruction from a 3d point cloud. IEEE Transactions on Multimedia, 24:1815–1829, 2021.
  29. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  30. Structural and functional hippocampal changes in multiple sclerosis patients with intact memory function. Radiology, 255(2):595–604, 2010.
  31. High resolution diffusion tensor imaging of the hippocampus across the healthy lifespan. Hippocampus, 31(12):1271–1284, 2021.
  32. Information bottlenecked variational autoencoder for disentangled 3d facial expression modelling. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 157–166, 2022.
  33. High-resolution diffusion tensor imaging and t2 mapping detect regional changes within the hippocampus in multiple sclerosis. NMR in Biomedicine, 36(9):e4952, 2023.
  34. Explainable artificial intelligence (xai) in deep learning-based medical image analysis. Medical Image Analysis, 79:102470, 2022.
  35. Pointflow: 3d point cloud generation with continuous normalizing flows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4541–4550, 2019.

Summary

  • The paper introduces a novel graph VAE that integrates supervised contrastive losses to effectively disentangle age and disease influences on 3D hippocampal meshes.
  • It leverages SpiralNet++ for mesh convolutions and demonstrates superior performance using SAP scores and reconstruction accuracy on synthetic and real datasets.
  • The method provides clinically interpretable models by clearly separating continuous (age) and categorical (MS) factors, with potential for broader anatomical studies.

Disentangling Hippocampal Shape Variations with Graph VAE and Supervised Contrastive Learning

Introduction and Motivation

This study presents a method for disentangling anatomical variation of the hippocampus in neuroimaging data, targeting the separation of age and disease (Multiple Sclerosis, MS) factors using a graph-based variational autoencoder (VAE) framework augmented with supervised contrastive learning. The primary motivation is to enable interpretability and factor-specific shape analysis in 3D mesh data, where traditional image-based or point cloud techniques are insufficient to capture the nuances of anatomical morphology in medical contexts. The approach employs a unified supervised contrastive loss to improve disentanglement of latent representations corresponding to continuous (age) and categorical (disease status) labels, facilitating fine-grained, clinically interpretable generative modeling of 3D hippocampal surfaces.

Methodology

Model Architecture

The network utilizes a graph VAE in which both encoder and decoder are constructed atop the mesh convolutional operator SpiralNet++. A 3D mesh of the hippocampus (or synthetic shapes) is represented by vertices XRN×3X \in \mathbb{R}^{N \times 3}, which are input to the encoder fϕf_\phi, producing a latent code z=[z1,z2,...,zdz]z = [z_1, z_2, ..., z_{d_z}]. The decoder fθf_\theta reconstructs the mesh from zz. The VAE backbone follows the standard β\beta-VAE formulation, incorporating both reconstruction and Kullback-Leibler divergence losses. Figure 1

Figure 1: Overall architecture of the graph VAE with separate supervised contrastive losses for classification (disease/MS) and regression (age), with dedicated disentangled latent spaces for each factor.

Supervised Contrastive Loss

Contrary to prior approaches like Guided VAE and Attribute VAE, the proposed model integrates a supervised contrastive loss inspired by the Soft Nearest Neighbor Loss (SNNL) framework. Two loss terms are introduced: LcontrclsL_{contr}^{cls} for class (MS vs. healthy) and LcontrregL_{contr}^{reg} for regression (age). The excitation-inhibition mechanism is encoded in the denominator of the loss, suppressing unintended correlations between non-target latent dimensions and the supervised factor, yielding enhanced selectivity and interpretability.

  • LcontrclsL_{contr}^{cls} promotes proximity of z1z_1 among samples sharing the same binary label (e.g., MS status) and discourages proximity among others, while inhibiting correlations in other latent dimensions.
  • LcontrregL_{contr}^{reg} treats numerical labels in a contrastive regime using a proximity threshold, enforcing similarity in z2z_2 for samples with similar ages.

The entire loss combines VAE, classification, and regression terms in

Lcontr=Lvae+Lcontrcls+LcontrregL_{contr} = L_{vae} + L_{contr}^{cls} + L_{contr}^{reg}

Data Pipeline and Preprocessing

Two datasets were processed: a synthetic 3D torus mesh dataset (with ground truth variability factors) and a high-resolution DTI hippocampus dataset. Hippocampal segmentation relied on a combination of automated and manual annotation, postprocessed via marching cubes isosurface extraction, Laplacian smoothing, alignment, and topological point correspondence using Deformetrica. All meshes were resampled for consistent topology enabling mesh convolution.

Experimental Evaluation

Disentanglement and Generative Performance

A comprehensive evaluation was performed against baselines (β\beta-VAE, β\beta-TCVAE) and SOTA methods (Supervised Guided VAE, Attribute VAE) on both synthetic and real data. The primary metric for disentanglement was the SAP score, supported by regression/classification accuracy on latent codes, Pearson/point-biserial correlation, and mean squared error (MSE). Figure 2

Figure 2: Reconstructions of synthetic torus shapes; left: color codes the difference between target and reconstruction, right: decoder output as z1z_1 (bump) and z2z_2 (scale) are varied independently.

The proposed Supervised Contrastive VAE (SC VAE) achieved superior SAP scores, high classification/regression accuracy, and favorable MSE on both synthetic and hippocampus datasets, consistently outperforming baselines in latent factor separation. Figure 3

Figure 3: Reconstructions of original and synthesized hippocampus meshes; left: reconstruction accuracy, right: reference hippocampus mesh.

Ablation Analysis

An ablation study established the impact of the inhibition mechanism: removing inhibition (i.e., λ2=0\lambda_2 = 0) reduced SAP by 5–15%, confirming that explicit decorrelation of non-target latent variables is essential for optimal disentanglement.

Computational Considerations

The SC VAE, while requiring marginally longer training time than other SOTA supervised models (e.g., Attribute VAE), remains highly practical for 3D mesh data, with inference times suitable for batch deployment.

Disentangled Shape Analysis

The effectiveness of the model is demonstrated both quantitatively and qualitatively: Figure 4

Figure 4: SAP disentanglement scores for classification and regression on synthetic torus (left) and hippocampus (right); higher bars indicate better latent selectivity.

Figure 5

Figure 5: Mesh visualizations of hippocampus volume change from healthy to MS across ages; yellow marks regions with highest atrophy.

Figure 6

Figure 6: Quantified hippocampal volume differences (healthy vs. MS) as a function of age for left and right hemispheres.

These results reveal nuanced, spatially localized atrophy patterns on the right hippocampus in MS subjects, in agreement with established neuroimaging findings. The generative model also captures progressive age-related volume loss.

Implications and Limitations

The supervised contrastive disentanglement paradigm for 3D meshes engineered in this work is directly transferable to other anatomical or morphological modeling problems where both continuous and categorical covariates drive structural variability. It offers an interpretable, scalable alternative to feed-forward classifier-based supervision, reducing architectural overhead and enhancing statistical specificity. This could facilitate, for example, factor-specific statistical atlases, conditional simulation for prognosis, or morphometric biomarker discovery.

Limitations stem from the need for mesh registration/topology normalization and the unavailability of longitudinal data. The capacity to fully generalize to population-level variability is thus constrained by sample size and cross-sectional design, particularly on the MS cohort. For greater clinical utility, integration with nonrigid deformation-insensitive feature extractors and training on larger longitudinal cohorts is advisable.

Conclusion

This paper advances the methodology for supervised disentanglement of 3D anatomical shape variation in neuroimaging, introducing a computationally efficient graph VAE architecture with a supervised contrastive loss that enables robust, interpretable separation of clinical (disease) and demographic (age) factors in the latent space. The model yields state-of-the-art SAP disentanglement and supports conditional generative exploration of anatomical outcomes, uncovering both expected and previously reported patterns of hippocampal atrophy in MS as a function of age. Future developments should target topology-agnostic mesh processing, scaling to richer multimodal clinical datasets, and further optimization of supervised/excitation-inhibition strategies for interpretable representation learning.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 4 likes about this paper.