Anatomically Guided Latent Diffusion Models

Updated 28 January 2026

AG-LDMs are generative frameworks that combine latent diffusion processes with explicit anatomical constraints to ensure high-fidelity and clinically plausible image synthesis.
They integrate segmentation supervision, morphological and topological losses, and clinical covariates to control and validate the generated anatomical structures.
AG-LDMs demonstrate state-of-the-art performance in tasks like image registration, disease progression modeling, and 3D shape generation, supporting robust in silico trials and simulation.

Anatomically Guided Latent Diffusion Models (AG-LDMs) are a class of generative frameworks that combine latent diffusion processes with explicit anatomical constraints to synthesize or manipulate medical images, segmentations, or geometric representations. AG-LDMs enable high-fidelity generation and controllable editing of anatomical structures while maintaining geometric, morphological, and topological validity, addressing challenges in realism, clinical plausibility, and downstream analysis.

1. Theoretical Foundations and General Framework

AG-LDMs build upon latent diffusion models (LDMs), where image or shape data are first compressed to a low-dimensional latent space via a variational autoencoder (VAE) or mesh graph autoencoder. Diffusion models are learned on this latent space, mapping between a Gaussian distribution and the data manifold using a denoising diffusion process. The key advance in AG-LDMs is the explicit incorporation of anatomical guidance—such as segmentation labels, morphological features, topological invariants, or clinical covariates—either during training, in the loss function, or as inference-time conditioning.

The forward noising process in LDMs typically follows: $q(\mathbf{z}_t|\mathbf{z}_0) = \mathcal{N}(\mathbf{z}_t; \sqrt{\bar{\alpha}_t}\,\mathbf{z}_0, (1-\bar{\alpha}_t)I)$ for latent code $\mathbf{z}_0$ (anatomical image, segmentation, or shape embedding), and the reverse process is learned via a score network: $p_\theta(\mathbf{z}_{t-1}|\mathbf{z}_t, \mathbf{y}) = \mathcal{N}(\mathbf{z}_{t-1}; \mu_\theta(\mathbf{z}_t, t, \mathbf{y}), \tilde{\beta}_t I)$ where $\mathbf{y}$ denotes anatomical or clinical conditional information (Wu et al., 2024, Kadry et al., 25 Nov 2025, Wan et al., 21 Jan 2026, Kadry et al., 2024).

The denoising network can be a 3D U-Net, fully connected network (for mesh data), or other architectures depending on the domain and data type (Mozyrska et al., 18 Aug 2025). Anatomical guidance is incorporated by augmenting the loss with topological, segmentational, conditional, or localized geometric constraints, or via explicit input fusion and cross-attention mechanisms.

2. Architectural Implementations and Conditioning Strategies

2.1. Segmentation and Geometric Guidance

Segmentation-guided LDMs, such as those for brain MRI progression modeling, introduce explicit segmentation supervision during both VAE fine-tuning and diffusion model training. A lightweight tissue segmentor (e.g., WarpSeg) computes soft segmentation masks, and differences between predicted and ground-truth masks are penalized via soft-Dice and cross-entropy losses (Wan et al., 21 Jan 2026). The composite network input concatenates noisy and clean latents with clinical covariates, enabling unified conditional generation.
Latent feature integration is exemplified in LDM-Morph for cardiac deformable registration, where a pre-trained LDM's latent features are extracted via DDIM inversion and fused with global transformer features by a Latent-Global Cross-Attention (LGCA) module. This module swaps queries between latent and image features before MLP and shift-window multi-head self-attention blocks, enabling the interaction of semantic and global information for optimized registration fields (Wu et al., 2024).
Controllable anatomical LDMs enable selection and targeting of anatomical substructures via cuboidal “control domains.” These are transformed affine subgrids extracted in latent or voxel space (L-parsing, V-parsing), decoded with neural field methods, and assessed by differentiable geometric moments and persistent-homology-based topological losses (Kadry et al., 25 Nov 2025).

2.2. Topological and Morphological Conditioning

AG-LDMs can explicitly enforce topological invariants (e.g., connected components, loops, and cavities) in generated structures via persistent homology. In “Anatomica,” the persistent pairs from superlevel set filtrations are partitioned to steer the diffusion process, maximizing desired features and minimizing spurious ones by intensity differences at their birth and death points (Kadry et al., 25 Nov 2025).
Morphological shape constraints (e.g., regional area, centroid, covariance) are incorporated either through MSE losses with target geometric statistics, or via regression networks for clinical features (e.g., coronary calcium arclength, wall thickness) (Kadry et al., 2024).

2.3. Clinical and Latent Conditioning

AG-LDMs for disease progression (e.g., brain MRI longitudinal synthesis) concatenate clinical covariates—such as age, sex, and diagnosis—spatially to the input latent tensor at each diffusion step, and leverage these as global conditioning factors for the noise-prediction U-Net (Wan et al., 21 Jan 2026).
Classifier-free and energy-based guidance mechanisms can be implemented for user-defined attributes, whereby conditional and unconditional score predictions are combined, or custom anatomical targets imposed via gradients of constraint losses during denoising (Mozyrska et al., 18 Aug 2025, Kadry et al., 2024).

3. Mathematical Formulations and Loss Functions

AG-LDMs introduce a hierarchy of task-specific loss terms layered atop standard VAE and diffusion objectives:

Latent and image-space metric mixing: In LDM-Morph, the similarity loss is a weighted sum of pixel-space MSE and LDM-encoded feature-space MSE,

$L_{\text{sim}} = \beta L_{\text{org}} + (1-\beta) L_{\text{lat}}$

where $L_{\text{org}}$ and $L_{\text{lat}}$ compare pixel intensities and encoder features, respectively. A smoothness term $L_{\text{smooth}}$ on the deformation field ensures topology preservation (Wu et al., 2024).

Anatomical and segmentation losses add soft-Dice and boundary cross-entropy terms to reconstruction and diffusion training: $\mathcal{L}_{\text{seg}} = \mathcal{L}_{\text{dice}} + \mathcal{L}_{\text{boundary}}$

$\mathcal{L}_{\text{LDM}} = \mathcal{L}_{\text{noise}} + \gamma\, \mathcal{L}_{\text{dice}}$

with $\gamma \ll 1$ (Wan et al., 21 Jan 2026).

Geometric loss terms for anatomical control sum over moments (mass, centroid, covariance) against targets, e.g.,

$L_{\text{geo}}^k = \lambda_0 (m_k - \bar{m}_k)^2 + \lambda_1 \lVert p_k - \bar{p}_k \rVert^2 + \lambda_2 \lVert \Sigma_k^n - \bar{\Sigma}_k^n \rVert_F^2$

where $\Sigma_k^n$ is the scale-normalized covariance (Kadry et al., 25 Nov 2025).

Topological loss via persistent homology imposes structure on higher-order features,

$L_{\text{topo}}^k = -\sum_{p\in\mathcal{Y}_k}| S_k(r_b^p) - S_k(r_d^p) |^2 + \sum_{p\in\mathcal{Z}_k}| S_k(r_b^p) - S_k(r_d^p) |^2$

(Kadry et al., 25 Nov 2025).

4. Applications: Medical Imaging, Shape Generation, and Digital Trials

AG-LDMs are utilized across a spectrum of application domains:

Image Registration: LDM-Morph demonstrates superior deformable registration on cardiac datasets (CAMUS, EchoNet-Dynamic, ACDC), with higher accuracy (DSC of 0.88–0.89) and lower fold rates (% det $J_\varphi\leq0$ ) than CNN and Transformer baselines, at competitive runtimes (0.10–0.12 s per 2D pair) (Wu et al., 2024).
Disease Progression Modeling: AG-LDM for brain MRI outperforms BrLP and other SOTA methods in longitudinal generation (e.g., 0.003 MSE and 1.3–2.6% volume MAE on ADNI; >15% reduction vs. previous best), and yields 31× stronger conditioning sensitivity for clinical covariates. Counterfactual simulations produce realistic neurodegenerative atrophy patterns (Wan et al., 21 Jan 2026).
3D Anatomy Generation: MeshLDM enables mesh-based LDM generation of left ventricular anatomical shapes with high clinical fidelity (2.4% error in population mean volume) and geometric accuracy (MMD: 13 mm), facilitating robust data augmentation and in silico simulation (Mozyrska et al., 18 Aug 2025).
Controllable Anatomy Synthesis: Anatomica and related frameworks synthesize anatomical segmentations under localized geometric (volume, centroid, shape) and topological constraints, enabling structurally valid anatomies suitable for virtual clinical trials and device development (Kadry et al., 25 Nov 2025, Kadry et al., 2024).
Vascular Structure and Virtual Intervention: Coronary morpho-skeletal control LDMs generate arterial segmentations with explicit control over topology and branch skeletons, enabling simulation-ready data for stent and device deployment studies. Topological losses substantially reduce error rates in anatomical realism compared to unconditional baselines (Kadry et al., 2024).

5. Evaluation, Quantitative Performance, and Limitations

AG-LDMs consistently achieve or exceed state-of-the-art performance in both geometric metrics (e.g., Dice, Chamfer distance, Fréchet Morphological Distance/FMD, 1-NNA) and clinical measures (e.g., segmentation volume error, atrophy measurement, component/branch count), as evidenced by:

Domain	SOTA Metric(s)	AG-LDM Result	Baseline(s)
Cardiac registration	DSC, % folds	0.889 DSC/0.178% folds	TransMorph 0.876/0.842%
Brain progression	Vol. MAE (Amyg, Hippo)	1.3% / 2.6%	BrLP 2.8% / 4.4%
Shape gen. (MeshLDM)	Mean volume, MMD	2.4%, 13 mm	N/A
Morph-skel. (coronaries)	Top. violation (%)	0.1% (lumen, topo reg)	1.4%
Anatomical composability	Correct topology (FMD)	Topo. precision 70–90%	<10% (unconditional)

Ablation studies across implementations emphasize that segmentation and anatomical constraints, either via explicit loss functions or inference-time guidance, are necessary for stability, correct topological structure, and reduced volume/shape error (Wu et al., 2024, Wan et al., 21 Jan 2026, Kadry et al., 25 Nov 2025).

Limitations include constraint granularity (e.g., 1D axis in morpho-skeletal LDMs), increased computational cost due to backpropagation for guidance per diffusion step, voxel-based resolution bottlenecks, and partial but not complete elimination of topological violations. Future directions include mesh-based or implicit decoders for higher resolution, richer clinical and morphological attributes, and efficient continual guidance techniques (Kadry et al., 25 Nov 2025, Mozyrska et al., 18 Aug 2025, Kadry et al., 2024).

6. Extensions, Perspectives, and Future Research

AG-LDMs are extensible to multiple anatomical domains (brain, cardiac, vascular, musculoskeletal), data types (volumetric, mesh, segmentation), and levels of anatomical abstraction (regional, local, global, topological). Conditioning mechanisms can be further expanded to incorporate dynamic temporal features (4D trajectories), multiscale or cross-modal data, and direct integration of user-specified clinical targets or device parameters.

This paradigm supports data generation for simulation, counterfactual modeling (e.g., AD conversion, surgical planning), and controllable in silico trials, with potential for module stacking (e.g., combining mesh/LDM generation with photorealistic rendering) and federated anatomical modeling. Comparative studies with transformer and non-diffusion models affirm that AG-LDMs achieve uniquely strong trade-offs in anatomical fidelity, extensible conditioning, and computational efficiency (Wu et al., 2024, Wan et al., 21 Jan 2026, Konz et al., 2024).

7. Representative Models and Key Contributions

Notable implementations and their contributions include:

LDM-Morph: Unsupervised cardiac registration with LDM latent features and hierarchical similarity metrics, yielding best-in-class topology-preserving warps (Wu et al., 2024).
MeshLDM: Mesh-based 3D latent diffusion for cardiac anatomy, highlighting conditioning extension potential for clinical priors and user constraints (Mozyrska et al., 18 Aug 2025).
Segmentation-Guided Diffusion: Mask-conditional pixel-space DDPMs with ablation training, enabling modular, high-fidelity medical image synthesis (Konz et al., 2024).
Anatomica: Local moment/topological control in 3D anatomical generation via partial latent slicing and persistent homology losses (Kadry et al., 25 Nov 2025).
AG-LDM for Brain Progression: Unified latent fusion of baseline/follow-up MRI and clinical code, with segmentation supervision, for anatomically consistent disease modeling (Wan et al., 21 Jan 2026).
Morpho-skeletal Coronary LDM: Differentiable guidance for continuous morphometric and skeletal targets, providing simulation-ready vasculature for virtual intervention studies (Kadry et al., 2024).

These models collectively define the modern landscape of anatomically guided latent diffusion for computational medicine and bioengineering.

Markdown Report Issue Upgrade to Chat

References (6)

LDM-Morph: Latent diffusion model guided deformable image registration (2024)

Anatomica: Localized Control over Geometric and Topological Properties for Anatomical Diffusion Models (2025)

Anatomically Guided Latent Diffusion for Brain MRI Progression Modeling (2026)

A Diffusion Model for Simulation Ready Coronary Anatomy with Morpho-skeletal Control (2024)

3D Cardiac Anatomy Generation Using Mesh Latent Diffusion Models (2025)

Anatomically-Controllable Medical Image Generation with Segmentation-Guided Diffusion Models (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Anatomically Guided Latent Diffusion Model (AG-LDM).

Anatomically Guided Latent Diffusion Models

1. Theoretical Foundations and General Framework

2. Architectural Implementations and Conditioning Strategies

2.1. Segmentation and Geometric Guidance

2.2. Topological and Morphological Conditioning

2.3. Clinical and Latent Conditioning

3. Mathematical Formulations and Loss Functions

4. Applications: Medical Imaging, Shape Generation, and Digital Trials

5. Evaluation, Quantitative Performance, and Limitations

6. Extensions, Perspectives, and Future Research

7. Representative Models and Key Contributions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Anatomically Guided Latent Diffusion Models

1. Theoretical Foundations and General Framework

2. Architectural Implementations and Conditioning Strategies

2.1. Segmentation and Geometric Guidance

2.2. Topological and Morphological Conditioning

2.3. Clinical and Latent Conditioning

3. Mathematical Formulations and Loss Functions

4. Applications: Medical Imaging, Shape Generation, and Digital Trials

5. Evaluation, Quantitative Performance, and Limitations

6. Extensions, Perspectives, and Future Research

7. Representative Models and Key Contributions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research