Anatomically Guided Latent Diffusion Models
- AG-LDMs are generative frameworks that combine latent diffusion processes with explicit anatomical constraints to ensure high-fidelity and clinically plausible image synthesis.
- They integrate segmentation supervision, morphological and topological losses, and clinical covariates to control and validate the generated anatomical structures.
- AG-LDMs demonstrate state-of-the-art performance in tasks like image registration, disease progression modeling, and 3D shape generation, supporting robust in silico trials and simulation.
Anatomically Guided Latent Diffusion Models (AG-LDMs) are a class of generative frameworks that combine latent diffusion processes with explicit anatomical constraints to synthesize or manipulate medical images, segmentations, or geometric representations. AG-LDMs enable high-fidelity generation and controllable editing of anatomical structures while maintaining geometric, morphological, and topological validity, addressing challenges in realism, clinical plausibility, and downstream analysis.
1. Theoretical Foundations and General Framework
AG-LDMs build upon latent diffusion models (LDMs), where image or shape data are first compressed to a low-dimensional latent space via a variational autoencoder (VAE) or mesh graph autoencoder. Diffusion models are learned on this latent space, mapping between a Gaussian distribution and the data manifold using a denoising diffusion process. The key advance in AG-LDMs is the explicit incorporation of anatomical guidance—such as segmentation labels, morphological features, topological invariants, or clinical covariates—either during training, in the loss function, or as inference-time conditioning.
The forward noising process in LDMs typically follows: for latent code (anatomical image, segmentation, or shape embedding), and the reverse process is learned via a score network: where denotes anatomical or clinical conditional information (Wu et al., 2024, Kadry et al., 25 Nov 2025, Wan et al., 21 Jan 2026, Kadry et al., 2024).
The denoising network can be a 3D U-Net, fully connected network (for mesh data), or other architectures depending on the domain and data type (Mozyrska et al., 18 Aug 2025). Anatomical guidance is incorporated by augmenting the loss with topological, segmentational, conditional, or localized geometric constraints, or via explicit input fusion and cross-attention mechanisms.
2. Architectural Implementations and Conditioning Strategies
2.1. Segmentation and Geometric Guidance
- Segmentation-guided LDMs, such as those for brain MRI progression modeling, introduce explicit segmentation supervision during both VAE fine-tuning and diffusion model training. A lightweight tissue segmentor (e.g., WarpSeg) computes soft segmentation masks, and differences between predicted and ground-truth masks are penalized via soft-Dice and cross-entropy losses (Wan et al., 21 Jan 2026). The composite network input concatenates noisy and clean latents with clinical covariates, enabling unified conditional generation.
- Latent feature integration is exemplified in LDM-Morph for cardiac deformable registration, where a pre-trained LDM's latent features are extracted via DDIM inversion and fused with global transformer features by a Latent-Global Cross-Attention (LGCA) module. This module swaps queries between latent and image features before MLP and shift-window multi-head self-attention blocks, enabling the interaction of semantic and global information for optimized registration fields (Wu et al., 2024).
- Controllable anatomical LDMs enable selection and targeting of anatomical substructures via cuboidal “control domains.” These are transformed affine subgrids extracted in latent or voxel space (L-parsing, V-parsing), decoded with neural field methods, and assessed by differentiable geometric moments and persistent-homology-based topological losses (Kadry et al., 25 Nov 2025).
2.2. Topological and Morphological Conditioning
- AG-LDMs can explicitly enforce topological invariants (e.g., connected components, loops, and cavities) in generated structures via persistent homology. In “Anatomica,” the persistent pairs from superlevel set filtrations are partitioned to steer the diffusion process, maximizing desired features and minimizing spurious ones by intensity differences at their birth and death points (Kadry et al., 25 Nov 2025).
- Morphological shape constraints (e.g., regional area, centroid, covariance) are incorporated either through MSE losses with target geometric statistics, or via regression networks for clinical features (e.g., coronary calcium arclength, wall thickness) (Kadry et al., 2024).
2.3. Clinical and Latent Conditioning
- AG-LDMs for disease progression (e.g., brain MRI longitudinal synthesis) concatenate clinical covariates—such as age, sex, and diagnosis—spatially to the input latent tensor at each diffusion step, and leverage these as global conditioning factors for the noise-prediction U-Net (Wan et al., 21 Jan 2026).
- Classifier-free and energy-based guidance mechanisms can be implemented for user-defined attributes, whereby conditional and unconditional score predictions are combined, or custom anatomical targets imposed via gradients of constraint losses during denoising (Mozyrska et al., 18 Aug 2025, Kadry et al., 2024).
3. Mathematical Formulations and Loss Functions
AG-LDMs introduce a hierarchy of task-specific loss terms layered atop standard VAE and diffusion objectives:
- Latent and image-space metric mixing: In LDM-Morph, the similarity loss is a weighted sum of pixel-space MSE and LDM-encoded feature-space MSE,
where and compare pixel intensities and encoder features, respectively. A smoothness term on the deformation field ensures topology preservation (Wu et al., 2024).
- Anatomical and segmentation losses add soft-Dice and boundary cross-entropy terms to reconstruction and diffusion training:
with (Wan et al., 21 Jan 2026).
- Geometric loss terms for anatomical control sum over moments (mass, centroid, covariance) against targets, e.g.,
where is the scale-normalized covariance (Kadry et al., 25 Nov 2025).
- Topological loss via persistent homology imposes structure on higher-order features,
4. Applications: Medical Imaging, Shape Generation, and Digital Trials
AG-LDMs are utilized across a spectrum of application domains:
- Image Registration: LDM-Morph demonstrates superior deformable registration on cardiac datasets (CAMUS, EchoNet-Dynamic, ACDC), with higher accuracy (DSC of 0.88–0.89) and lower fold rates (% det ) than CNN and Transformer baselines, at competitive runtimes (0.10–0.12 s per 2D pair) (Wu et al., 2024).
- Disease Progression Modeling: AG-LDM for brain MRI outperforms BrLP and other SOTA methods in longitudinal generation (e.g., 0.003 MSE and 1.3–2.6% volume MAE on ADNI; >15% reduction vs. previous best), and yields 31× stronger conditioning sensitivity for clinical covariates. Counterfactual simulations produce realistic neurodegenerative atrophy patterns (Wan et al., 21 Jan 2026).
- 3D Anatomy Generation: MeshLDM enables mesh-based LDM generation of left ventricular anatomical shapes with high clinical fidelity (2.4% error in population mean volume) and geometric accuracy (MMD: 13 mm), facilitating robust data augmentation and in silico simulation (Mozyrska et al., 18 Aug 2025).
- Controllable Anatomy Synthesis: Anatomica and related frameworks synthesize anatomical segmentations under localized geometric (volume, centroid, shape) and topological constraints, enabling structurally valid anatomies suitable for virtual clinical trials and device development (Kadry et al., 25 Nov 2025, Kadry et al., 2024).
- Vascular Structure and Virtual Intervention: Coronary morpho-skeletal control LDMs generate arterial segmentations with explicit control over topology and branch skeletons, enabling simulation-ready data for stent and device deployment studies. Topological losses substantially reduce error rates in anatomical realism compared to unconditional baselines (Kadry et al., 2024).
5. Evaluation, Quantitative Performance, and Limitations
AG-LDMs consistently achieve or exceed state-of-the-art performance in both geometric metrics (e.g., Dice, Chamfer distance, Fréchet Morphological Distance/FMD, 1-NNA) and clinical measures (e.g., segmentation volume error, atrophy measurement, component/branch count), as evidenced by:
| Domain | SOTA Metric(s) | AG-LDM Result | Baseline(s) |
|---|---|---|---|
| Cardiac registration | DSC, % folds | 0.889 DSC/0.178% folds | TransMorph 0.876/0.842% |
| Brain progression | Vol. MAE (Amyg, Hippo) | 1.3% / 2.6% | BrLP 2.8% / 4.4% |
| Shape gen. (MeshLDM) | Mean volume, MMD | 2.4%, 13 mm | N/A |
| Morph-skel. (coronaries) | Top. violation (%) | 0.1% (lumen, topo reg) | 1.4% |
| Anatomical composability | Correct topology (FMD) | Topo. precision 70–90% | <10% (unconditional) |
Ablation studies across implementations emphasize that segmentation and anatomical constraints, either via explicit loss functions or inference-time guidance, are necessary for stability, correct topological structure, and reduced volume/shape error (Wu et al., 2024, Wan et al., 21 Jan 2026, Kadry et al., 25 Nov 2025).
Limitations include constraint granularity (e.g., 1D axis in morpho-skeletal LDMs), increased computational cost due to backpropagation for guidance per diffusion step, voxel-based resolution bottlenecks, and partial but not complete elimination of topological violations. Future directions include mesh-based or implicit decoders for higher resolution, richer clinical and morphological attributes, and efficient continual guidance techniques (Kadry et al., 25 Nov 2025, Mozyrska et al., 18 Aug 2025, Kadry et al., 2024).
6. Extensions, Perspectives, and Future Research
AG-LDMs are extensible to multiple anatomical domains (brain, cardiac, vascular, musculoskeletal), data types (volumetric, mesh, segmentation), and levels of anatomical abstraction (regional, local, global, topological). Conditioning mechanisms can be further expanded to incorporate dynamic temporal features (4D trajectories), multiscale or cross-modal data, and direct integration of user-specified clinical targets or device parameters.
This paradigm supports data generation for simulation, counterfactual modeling (e.g., AD conversion, surgical planning), and controllable in silico trials, with potential for module stacking (e.g., combining mesh/LDM generation with photorealistic rendering) and federated anatomical modeling. Comparative studies with transformer and non-diffusion models affirm that AG-LDMs achieve uniquely strong trade-offs in anatomical fidelity, extensible conditioning, and computational efficiency (Wu et al., 2024, Wan et al., 21 Jan 2026, Konz et al., 2024).
7. Representative Models and Key Contributions
Notable implementations and their contributions include:
- LDM-Morph: Unsupervised cardiac registration with LDM latent features and hierarchical similarity metrics, yielding best-in-class topology-preserving warps (Wu et al., 2024).
- MeshLDM: Mesh-based 3D latent diffusion for cardiac anatomy, highlighting conditioning extension potential for clinical priors and user constraints (Mozyrska et al., 18 Aug 2025).
- Segmentation-Guided Diffusion: Mask-conditional pixel-space DDPMs with ablation training, enabling modular, high-fidelity medical image synthesis (Konz et al., 2024).
- Anatomica: Local moment/topological control in 3D anatomical generation via partial latent slicing and persistent homology losses (Kadry et al., 25 Nov 2025).
- AG-LDM for Brain Progression: Unified latent fusion of baseline/follow-up MRI and clinical code, with segmentation supervision, for anatomically consistent disease modeling (Wan et al., 21 Jan 2026).
- Morpho-skeletal Coronary LDM: Differentiable guidance for continuous morphometric and skeletal targets, providing simulation-ready vasculature for virtual intervention studies (Kadry et al., 2024).
These models collectively define the modern landscape of anatomically guided latent diffusion for computational medicine and bioengineering.