BraTS 2021 Challenge Dataset Overview

Updated 16 January 2026

BraTS 2021 is a large-scale federated repository of multi-parametric MRI scans and expert tumor annotations for glioma research.
It features harmonized imaging modalities with standardized preprocessing and expert segmentation protocols across diverse clinical sources.
The dataset supports robust evaluation of advanced algorithms in segmentation and radiogenomic classification using metrics like Dice and Hausdorff distance.

The RSNA-ASNR-MICCAI BraTS 2021 Challenge Dataset is a large-scale, federated repository of multi-parametric magnetic resonance imaging (mpMRI) scans and expert tumor annotations for computational neuro-oncology research. Serving as the benchmark for the 10th BraTS Challenge edition, it encompasses harmonized pre-operative data from 2,040 adult patients with pathologically confirmed gliomas and associated O[6]-methylguanine-DNA methyltransferase (MGMT) promoter methylation status. The dataset offers a structured framework for the evaluation of algorithms targeting brain tumor segmentation and radiogenomic classification, emphasizing reproducibility and generalizability in high-grade and low-grade glioma analysis (Baid et al., 2021).

1. Dataset Structure and Composition

The BraTS 2021 distribution incorporates mpMRI scans from over 80 institutions, leveraging both public biomedical archives (e.g., TCGA-GBM, TCGA-LGG, IvyGAP, CPTAC-GBM, ACRIN-FMISO) and private clinical sources. Each subject is represented by four co-registered, skull-stripped NIfTI volumes:

Native T1-weighted (T1)
Gadolinium-enhanced T1 (T1Gd, also termed T1ce)
T2-weighted (T2)
T2-FLAIR

Training, validation, and test splits are defined for reproducible machine-learning workflows:

Subset	Number of Subjects	Modalities	Annotations Available
Training	1,251	T1, T1Gd, T2, FLAIR	Segmentation + MGMT
Validation	219	T1, T1Gd, T2, FLAIR	None
Testing	570	T1, T1Gd, T2, FLAIR	None

Additionally, a dedicated subset of 585 subjects is available for MGMT promoter methylation classification (Kollias et al., 2023), focusing on glioblastoma cases and recapitulating the molecular heterogeneity observed in clinical populations.

2. Imaging Modalities and Standardized Pre-processing

All four image volumes per subject undergo the following harmonized pre-processing pipeline:

Rigid registration to the SRI24/MNI anatomical template
Automatic brain masking (skull-stripping) via a deep-learning model
Resampling to 1 mm × 1 mm × 1 mm isotropic voxels
Cropping to a bounding box of 240 × 240 × 155 voxels per modality
Intensity normalization: subject- and modality-specific z-score normalization, with voxel intensities transformed as $\hat I(x) = (I(x) – \mu)/\sigma$ , where $\mu$ and $\sigma$ are computed within the brain mask (Bukhari et al., 2021)

The segmentation volumes are provided as NIfTI files with consistent naming conventions, facilitating automated loading and reproducibility across machine-learning pipelines.

3. Annotation Protocol and Label Structure

Expert neuroradiologists delineated three tumor sub-regions for each volume, following standardized radiological criteria:

Necrotic/non-enhancing tumor core (NCR or NEC; label 1)
Peritumoral edema/infiltrated tissue (ED or PTE; label 2)
Enhancing tumor (ET or ENC; label 4)
Background/healthy tissue (label 0)

For segmentation evaluation, labels are fused into three hierarchical regions:

Derived Region	Label Components
Whole Tumor (WT)	NEC ∪ PTE ∪ ENC
Tumor Core (TC)	NEC ∪ ENC
Enhancing Core	ENC

Initial segmentations were generated via label fusion (STAPLE) using outputs from top-ranked BraTS 2020 segmentation methods (nnU-Net, DeepMedic, DeepScan), followed by manual refinement and quality assurance in ITK-SNAP by senior neuroradiologists (Baid et al., 2021).

For radiogenomic classification (MGMT status), each subject is assigned a binary label (0 = unmethylated, 1 = methylated), curated via pyrosequencing ( $\geq$ 10% CpG methylation) or bisulfite sequencing ( $\geq$ 2% CpGs methylated).

4. Technical Challenges in Multicenter Neuroimaging

The BraTS 2021 dataset reflects significant heterogeneity in scanner hardware, MR acquisition protocols, and patient populations:

Variable noise profiles and bias-field artifacts affect cross-site data harmonization
Class imbalance: peritumoral edema (PTE/ED) often dominates tumor volume (~60–80% of abnormal tissue), whereas enhancing core (ENC/ET) can be extremely small ( $<$ 10 mL) or absent
Tumor size variability: lesion volumes range from under 1 mL to over 100 mL
Subtle intensity distribution shifts persist despite normalization, motivating extensive data augmentation (e.g., gamma correction, geometric transformations)
Tumor sub-regions exhibit non-contiguous, topologically complex morphologies, complicating registration, segmentation, and evaluation workflows (Bukhari et al., 2021)

For MGMT classification, challenges are compounded by the variable number of slices per volume and the lack of slice-level annotations. Models must be robust to padded slices and leverage volume-level labels without propagating label noise (Kollias et al., 2023).

5. Evaluation Metrics and Benchmarking Framework

Segmentation performance in BraTS 2021 is assessed chiefly by the Dice similarity coefficient:

$\mathrm{Dice}(A,B) = \frac{2\,|A \cap B|}{|A| + |B|}$

and the 95th-percentile Hausdorff distance:

$H_{95}(A,B) = \max\{h_{95}(A,B), h_{95}(B,A)\}$

$h_{95}(A,B) = \mathrm{percentile}_{95}\left\{\inf_{b \in B} d(a, b) \mid a \in A \right\}$

Sensitivity and specificity characterize over- and under-segmentation rates. Teams are ranked by aggregate scores across cases, regions, and metrics, with permutation testing for statistical significance. MGMT promoter methylation prediction models are evaluated by area under the receiver operating characteristic (ROC) curve (AUC), accuracy, F₁-score, and Matthews correlation coefficient.

Challenge submissions are handled via Sage Bionetworks Synapse (segmentation) and Kaggle (radiogenomic classification), ensuring transparent assessment and leaderboard integrity (Baid et al., 2021).

6. Practical Implementations and State-of-the-art Methods

The dataset underpins a suite of contemporary algorithmic developments:

Encoder-decoder architectures based on 3D U-Net and its variants dominate segmentation performance, as highlighted by E1D3 U-Net—a fully convolutional design with one encoder and three decoders, each targeting one hierarchical tumor region (WT, TC, EN) (Bukhari et al., 2021).
Multi-modal methods for MGMT classification—including BTDNet—leverage CNN-RNN combinations, advanced modality fusion, per-slice geometric augmentation, MixAugment for virtual example generation, and test-time augmentation by summing logits across flips and rotations. The use of routing components and mask layers optimizes dataflow for variable-length volumes and restricts predictions to valid slices (Kollias et al., 2023).

7. Data Access, Limitations, and Directions

All data files are organized by pseudo-anonymized patient IDs, with harmonized directory and file naming conventions. Modalities and segmentation labels are universally available in training, while validation and test subsets omit ground-truth to support fair challenge participation. The MGMT classification CSV unambiguously maps each ID to molecular status.

Limitations include lack of systematic demographic details, persistent multicenter variability, inter-observer variability in segmentations, and non-standardized grade breakdown for MGMT classification. A plausible implication is that future BraTS editions may further refine annotation protocols, expand clinical metadata, or introduce more ambitious radiogenomic endpoints.

BraTS 2021 establishes an extensible benchmark for segmentation and radiogenomic classification in neuro-oncology, supporting reproducible machine-learning research and multi-modal method development across diverse glioma phenotypes (Baid et al., 2021, Bukhari et al., 2021, Kollias et al., 2023).

Markdown Report Issue Upgrade to Chat

References (3)

The RSNA-ASNR-MICCAI BraTS 2021 Benchmark on Brain Tumor Segmentation and Radiogenomic Classification (2021)

BTDNet: a Multi-Modal Approach for Brain Tumor Radiogenomic Classification (2023)

E1D3 U-Net for Brain Tumor Segmentation: Submission to the RSNA-ASNR-MICCAI BraTS 2021 Challenge (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to BraTS 2021 Challenge Dataset.

BraTS 2021 Challenge Dataset Overview

1. Dataset Structure and Composition

2. Imaging Modalities and Standardized Pre-processing

3. Annotation Protocol and Label Structure

4. Technical Challenges in Multicenter Neuroimaging

5. Evaluation Metrics and Benchmarking Framework

6. Practical Implementations and State-of-the-art Methods

7. Data Access, Limitations, and Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

BraTS 2021 Challenge Dataset Overview

1. Dataset Structure and Composition

2. Imaging Modalities and Standardized Pre-processing

3. Annotation Protocol and Label Structure

4. Technical Challenges in Multicenter Neuroimaging

5. Evaluation Metrics and Benchmarking Framework

6. Practical Implementations and State-of-the-art Methods

7. Data Access, Limitations, and Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research