SegRap2025 Challenge: Radiotherapy Segmentation

Updated 4 February 2026

SegRap2025 Challenge is a benchmark that evaluates automated segmentation algorithms for radiotherapy target delineation in nasopharyngeal carcinoma using CT scans.
It incorporates multi-center and multi-modality data to simulate clinical variability for tasks such as GTV and LN CTV segmentation.
The challenge drives research on improving model robustness against domain shifts and missing modalities to enhance clinical radiotherapy planning.

SegRap2025 is a benchmark challenge designed to evaluate and advance automated segmentation algorithms for radiotherapy target delineation, specifically focusing on Gross Tumor Volume (GTV) and Lymph Node Clinical Target Volume (LN CTV) in nasopharyngeal carcinoma (NPC) from computed tomography (CT) scans. Building on the single-center, paired-modality dataset and tasks of SegRap2023, SegRap2025 introduces a multi-center, multi-modality framework targeting improved generalizability and robustness of segmentation models across clinical domains. The benchmark provides tasks, datasets, and evaluation protocols that simulate real-world clinical variability, supporting the development of clinically applicable automatic radiotherapy planning systems (Fu et al., 28 Jan 2026).

1. Motivation and Objectives

The accurate delineation of GTV, LN CTV, and organs-at-risk (OARs) is fundamental for effective radiotherapy planning. SegRap2023 previously established that high segmentation accuracy is achievable for large OARs (average DSC ≈ 86.7%), but GTV segmentation remained more challenging, and questions of model transferability across imaging centers and handling of missing modalities were not adequately addressed. SegRap2025 aims to bridge these gaps by explicitly evaluating model performance under domain-shift (cross-center) and missing-modality conditions, thereby catalyzing research on robust, generalizable segmentation techniques for NPC radiotherapy (Fu et al., 28 Jan 2026).

2. Challenge Structure and Task Definition

The SegRap2025 challenge is organized around two clinically relevant tasks:

Task 01: GTV Segmentation. This involves delineation of both the primary tumor GTV (GTVp) and lymph node GTV (GTVnd) using paired non-contrast CT (ncCT) and contrast-enhanced CT (ceCT) scans. The task incorporates an external testing set to evaluate cross-center generalization, as precise tumor coverage with minimal healthy tissue exposure is the clinical imperative.
Task 02: LN CTV Segmentation. This entails the segmentation of six standard nodal levels—the left/right (L/R) Ib, II+III+Va, and IV+Vb+Vc—from either paired CT or single-modality (ceCT-only or ncCT-only) scans. Delineation accuracy in this context is crucial for guiding prophylactic irradiation of potential microscopic disease spread, and the task specifically challenges models with both cross-center and cross-modality variation.

3. Dataset Composition and Organization

Data for SegRap2025 are sourced from multiple centers and structured as follows:

Task	Training	Validation	Testing (Internal)	Testing (External)
GTV (Task 01)	120 labeled paired ncCT+ceCT (SCH) + 500 unlabeled CT	20 paired ncCT+ceCT (SCH)	60 paired ncCT+ceCT (SCH)	60 paired ncCT+ceCT (DHCJ)
LN CTV (Task 02)	262 labeled: 150 paired + 112 single-modality (SCH, SPH, APH, SMU) + 500 unlabeled CT	–	–	40 paired, 30 ceCT-only, 30 ncCT-only (DHCJ)

The inclusion of 500 unlabeled single-modality CTs explicitly enables semi-supervised and self-supervised learning strategies. The external testing cohorts, all from Daguan Hospital of Chengdu Jinjiang (DHCJ), serve to rigorously assess cross-center and cross-modality generalization. Each LN CTV case may represent either paired modalities or a single available scan, reflecting practical clinical realities.

4. Evaluation Metrics

Model performance is primarily quantified using:

Dice Similarity Coefficient (DSC): $\mathrm{DSC}(P,Y)=\frac{2|P\cap Y|}{|P|+|Y|}$ , quantifying volumetric overlap between prediction and ground truth.
Normalized Surface Dice (NSD): $\mathrm{NSD}(P,Y)=\frac{|S_P\cap S_Y^{(\tau)}|+|S_Y\cap S_P^{(\tau)}|}{|S_P|+|S_Y|}$ , assessing the proportion of surface voxels within tolerance τ (1 mm for GTVs, 2 mm for LN CTVs).
Secondary metrics: Average symmetric surface distance and Hausdorff distance are computed for further analysis, but not used in leaderboard ranking.

Predictions are ranked primarily according to DSC, with NSD providing an additional indicator of acceptable boundary placement per clinical standards.

5. Results and Key Findings

Challenge results highlight substantive findings regarding cross-domain generalization, modality robustness, and method efficacy:

Task 01 (GTV Segmentation):
- Internal testing (SCH): Highest average DSC = 74.61%
- External testing (DHCJ): Highest average DSC = 56.79%
- There is an average drop of approximately 18 percentage points (pp) in DSC when transferring models to an unseen clinical center, evidencing unresolved domain-shift challenges.
Task 02 (LN CTV Segmentation):
- Paired CT: Highest average DSC = 60.24%
- ceCT-only: Highest average DSC = 60.50%
- ncCT-only: Highest average DSC = 57.23%
- The performance is stable between paired and ceCT-only subsets but degrades by ~3 pp on ncCT-only, reflecting the loss of contrast-enhanced anatomical information.

No method reached clinical acceptance thresholds on external data (DSC > 83% for GTVp, > 79% for LN CTVs), indicating that manual refinement remains necessary.

6. Methodological Approaches

Leading teams converged on and innovated with several methodological strategies:

Backbone architectures: nnU-Net variants, U-Net, V-Net, MedNeXt, STU-Net, BLU-Net (Bootstrapped Learning Unified Network).
Pre-processing steps: Body-region cropping, intensity normalization/clipping (e.g., ncCT to [–600,600], ceCT to [–1000,1000] HU), resampling to isotropic voxel sizes.
Data augmentation: Spatial (rotation, scaling, mirroring; mirroring disabled for LN CTV), elastic deformation, Gaussian noise/blur, contrast/gamma augmentation, mixup, CutMix, Learnable Bezier Grayscale Transform (LBGT) for pseudo-multi-modal synthesis.
Semi-/self-supervised learning: Pseudo-labeling of unlabeled scans and masked autoencoder pretraining.
Foundation model adaptation: Fine-tuning a lymph-node segmentation foundation model (LN-Seg-FM).
Missing-modality handling: Separate expert models for each modality, generative synthesis using CircleGAN to impute missing scans, modality-priority strategies.
Post-processing: Test-time augmentation, ensembling models across folds or patch sizes, connected-component analysis.

These approaches specifically target feature representation enhancement, increased data diversity, and robustness to both center and modality variability.

7. Clinical Implications and Future Directions

SegRap2025 institutionalizes a multi-center, multi-modality benchmark reflecting authentic clinical diversity. The documented performance gaps, particularly an ~18 pp decrease in external GTV segmentation and a ~3 pp loss for ncCT-only LN CTV segmentation, affirm ongoing challenges in domain generalization and modality robustness.

For clinical translation, delineation accuracy must increase: no submitted method met the clinical DSC thresholds on external test sets, reaffirming the requirement for manual and expert oversight.

Identified avenues for future research include:

Expansion to longitudinal data (mid- and post-treatment scans) to facilitate adaptive radiotherapy.
Incorporation of radiology reports or clinical metadata for additional contextual information.
Advanced domain-generalization strategies such as adaptive normalization, disentanglement learning, and diffusion-based CT contrast synthesis.
Exploitation of vision-language and anatomical graph models for spatial relationship encoding between OARs, GTVs, and CTVs.
Extensive self-supervised or foundation-model pretraining for domain-invariant representation learning.

SegRap2025 provides open access to its multi-center datasets and detailed leaderboard results [https://hilab-git.github.io/SegRap2025_Challenge], establishing a foundation for further research on robust, automated radiotherapy planning (Fu et al., 28 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

SegRap2025: A Benchmark of Gross Tumor Volume and Lymph Node Clinical Target Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SegRap2025 Challenge.