CheXmask-U: Quantifying uncertainty in landmark-based anatomical segmentation for X-ray images

Published 11 Dec 2025 in cs.CV | (2512.10715v1)

Abstract: Uncertainty estimation is essential for the safe clinical deployment of medical image segmentation systems, enabling the identification of unreliable predictions and supporting human oversight. While prior work has largely focused on pixel-level uncertainty, landmark-based segmentation offers inherent topological guarantees yet remains underexplored from an uncertainty perspective. In this work, we study uncertainty estimation for anatomical landmark-based segmentation on chest X-rays. Inspired by hybrid neural network architectures that combine standard image convolutional encoders with graph-based generative decoders, and leveraging their variational latent space, we derive two complementary measures: (i) latent uncertainty, captured directly from the learned distribution parameters, and (ii) predictive uncertainty, obtained by generating multiple stochastic output predictions from latent samples. Through controlled corruption experiments we show that both uncertainty measures increase with perturbation severity, reflecting both global and local degradation. We demonstrate that these uncertainty signals can identify unreliable predictions by comparing with manual ground-truth, and support out-of-distribution detection on the CheXmask dataset. More importantly, we release CheXmask-U (huggingface.co/datasets/mcosarinsky/CheXmask-U), a large scale dataset of 657,566 chest X-ray landmark segmentations with per-node uncertainty estimates, enabling researchers to account for spatial variations in segmentation quality when using these anatomical masks. Our findings establish uncertainty estimation as a promising direction to enhance robustness and safe deployment of landmark-based anatomical segmentation methods in chest X-ray. A fully working interactive demo of the method is available at huggingface.co/spaces/matiasky/CheXmask-U and the source code at github.com/mcosarinsky/CheXmask-U.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper presents a novel hybrid CNN-GNN architecture for node-wise uncertainty quantification in CXR landmark segmentation.
It rigorously validates uncertainty metrics through occlusion, noise, and OOD experiments, achieving ROC AUC up to 0.98.
CheXmask-U offers a public dataset with 657,566 annotated CXR segmentations, enabling refined clinical decision support.

Uncertainty Quantification in Landmark-Based Anatomical Segmentation for Chest X-rays

Introduction

Quantifying uncertainty in anatomical segmentation has become essential for trustworthy deployment of deep learning-based systems in medical imaging, particularly under clinically relevant conditions such as occlusion, corruption, and out-of-distribution (OOD) samples. Unlike dense pixel-based approaches, landmark-based segmentation provides topo-anatomical guarantees, but until now, fine-grained uncertainty estimation at the node (landmark) level has remained largely understudied. This paper presents a systematic framework for node-wise uncertainty estimation within landmark-based segmentation, focusing on chest X-ray (CXR) analysis using a variational hybrid CNN-GNN architecture.

HybridGNet Architecture for Landmark Segmentation

HybridGNet integrates a CNN-based encoder with a GCNN-based decoder, operating in a probabilistic VAE latent space. The primary innovation lies in using this latent space to provide two orthogonal uncertainty metrics: (i) model (epistemic) uncertainty via the learned latent variance, and (ii) predictive uncertainty via variance over Monte Carlo-sampled landmark predictions.

Figure 1: Block diagram of HybridGNet showing the extraction of a latent uncertainty $\sigma^2$ , followed by sampling-based node-wise predictive uncertainty estimation.

Landmarks are graph-represented anatomical points with a shared adjacency structure, ensuring anatomical feasibility and facilitating structured predictive modeling. The network supports skip connections from the encoder to the decoder, which were analyzed for their impact on uncertainty interpretability, especially under corruption.

Uncertainty Estimation Methodology

Latent-space uncertainty is quantified as the mean latent variance vector $\sigma^2$ —reflecting epistemic model uncertainty not explained by the training data. Predictive uncertainty is estimated by batch-wise decoding of N latent samples through the GCNN, yielding landmark sets whose per-node sample variance serves as the uncertainty metric.

This node-wise uncertainty is highly informative, spatially resolving model confidence and allowing differential trust in various anatomical regions.

Experimental Validation

Extensive corruption studies investigated the robustness and interpretability of both uncertainty metrics:

Occlusion Experiments: Artificial black-square masks were used to create localized perturbations. Per-node uncertainty systematically increased under occluded regions, confirming strong spatial specificity.
Figure 2: Visualization of uncertainty maps under occlusion, with clear uncertainty elevation in masked zones; boxplots confirm statistical significance.
Noise Corruption: Gaussian noise of increasing magnitude was applied to the images. Both metrics increased monotonically with corruption—except for non-monotonicity in the skip-connection model's predictive uncertainty at severe corruptions, attributed to direct high-resolution feature flow bypassing the variational bottleneck.
Figure 3: (a) Both uncertainty measures rise with Gaussian noise, plateauing at high levels; (b) KDEs for OOD detection with strong ID/OOD separation; (c) Strong anti-correlation between average uncertainty and RCA-estimated Dice, evidencing uncertainty's reliability as a quality surrogate.
OOD Detection: On CheXMask, predictive uncertainty delivered excellent ID/OOD discrimination, with ROC AUC up to 0.98 using skip-connections. Latent-space-based anomaly scores (Isolation Forest) also performed well.
Error Prediction: Cross-image and cross-node correlations between node-wise uncertainty and error (relative to multi-expert annotation) were strong, evidencing the practical predictive value of per-node UQ.
Figure 4: High correlation between predictive uncertainty and node-level annotation errors, confirming practical validity of uncertainty estimates.

CheXmask-U: Dataset Release

CheXmask-U provides 657,566 chest X-ray segmentations, each annotated with per-node uncertainty for direct community use. Uncertainty metrics were computed via 50 Monte Carlo landmark samples per image, using pre-trained HybridGNet weights. Beyond simply supplementing the anatomical coordinate sets, this resource enables slicing, filtering, or region-focused confidence weighting in downstream CXR tasks, without requiring consumers to re-run computationally intensive UQ techniques.

Node-level validation, involving comparison with expert multiple annotations, demonstrates that CheXmask-U uncertainty estimates reliably reflect region-specific reliability, surpassing the original image-level RCA-Dice score in granularity.

Implications and Future Directions

This node-level UQ framework has direct implications for image triage, automated reporting, and selective use of segmentation outputs in context-aware clinical decision support. The ability to spatially map uncertainty supports safe automation, targeted human review, and can guide adaptive acquisition or annotation strategies.

In future work, adaptations to multi-organ, multi-modality, or 3D contexts are clear next steps, as well as integration with more sophisticated epistemic/aleatoric disentanglement methods and application-specific thresholds for automatic acceptance/rejection or weighting schemes.

Conclusion

This paper proposes and validates the first comprehensive node-level uncertainty quantification scheme for landmark-based anatomical segmentation in CXR, underpinned by systematic experiments across occlusion, noise, OOD, and error correlation axes. CheXmask-U constitutes a public, fine-grained resource for advancing anatomically aware, uncertainty-calibrated segmentation in medical imaging.

Markdown Report Issue