MS-ToT Dataset: MRI Tumor Segmentation
- MS-ToT Dataset is a curated collection of 199 MRI studies focused on expert-validated musculoskeletal soft tissue tumor segmentation across diverse anatomical sites.
- It employs a multi-stage annotation pipeline with radiologist-led delineation and automated registration to ensure high-quality, consistent segmentation masks.
- Benchmarking with U-Net and LiteMedSAM yields Dice scores ranging from 0.67 to 0.79, highlighting strong performance on large tumors and challenges with low-contrast cases.
The MSTT-199 dataset is a curated collection of magnetic resonance imaging (MRI) studies focused on musculoskeletal soft tissue tumor segmentation. Designed to address the challenge of automated, expert-level soft tissue tumor delineation, MSTT-199 comprises 199 patient studies with high-quality, multi-sequence MRI and rigorous radiologist-validated annotations. The dataset supports the development, benchmarking, and cross-domain validation of segmentation algorithms in the context of diverse anatomical locations and tumor tissue types (Reasat et al., 2024).
1. Dataset Construction and Imaging Protocols
MSTT-199 consists of 199 patients, each contributing one MRI study containing both an axial non-contrast T1-weighted (T1) and an axial T2 fat-saturated (T2-FS) series. Only studies with both sequences and that passed quality control were included. The anatomical distribution is dominated by extremity tumors, with the thigh (103/199), leg (25/199), and glute (19/199) as the primary sites, complemented by smaller, less frequent locations such as forearm, arm, shoulder, and trunk regions.
The protocol for image acquisition followed routine clinical practice. In-plane resolution and field strengths were not standardized, but all studies were resampled prior to analysis to isotropic 1×1×1 mm³ voxels. Axial slice thickness varied from 3–5 mm, without inter-slice gaps. The dataset's anatomical heterogeneity is outlined in the following table (excerpt):
| Tumor Site | Fibrous | Fat | Myxoid | Nerve | Vascular | Total |
|---|---|---|---|---|---|---|
| Thigh | 21 | 35 | 26 | 10 | 11 | 103 |
| Leg | 5 | 2 | 2 | 11 | 5 | 25 |
| ... | ... | ... | ... | ... | ... | ... |
| Total | 40 | 40 | 40 | 40 | 39 | 199 |
Tumors are distributed across five roughly balanced tissue classes: fibrous (n=40), fat (n=40), myxoid (n=40), nerve (n=40), and vascular (n=39).
2. Annotation Pipeline and Labeling Strategies
Annotation workflow employed a multi-stage, expert-driven protocol:
- Stage 1: A board-certified musculoskeletal radiologist delineated the tumor on the center axial slice, using T2-FS for fibrous, myxoid, nerve, and vascular tumors, and T1 for fat tumors to optimize lesion contrast.
- Stage 2: Three trained annotators extended the segmentation to adjacent slices under radiologist supervision.
- Stage 3: Each completed study underwent a final review by one of three musculoskeletal radiologists.
Annotation was performed using LabelStudio, an open-source platform, augmented with ANTs registration to align T1 and T2 series. Only a single binary segmentation mask (tumor versus background) was provided per study but tumors were tagged by tissue type to support stratified analyses. Narrow “tail” extensions were excluded if not clearly demarcated. This protocol integrates expert oversight at all critical points, enforcing annotation consistency and clinical validity.
3. Availability, Splitting, and External Benchmarks
MSTT-199 does not use fixed training/validation/test splits. Instead, all algorithmic benchmarks reported were computed using 5-fold cross-validation over all 199 patients. Slices with fewer than 100 tumor voxels were excluded to avoid trivial masks.
For external benchmarking, a public dataset ("STS", n=51 from TCIA) featuring soft tissue sarcoma MRI was employed. All code, trained models, and metadata for MSTT-199 are available at https://github.com/Reasat/mstt; the repository defaults to the MIT license, although the paper does not explicitly specify licensing terms.
4. Baseline Segmentation Architectures and Preprocessing
Two primary architectures were benchmarked:
- U-Net (2.5D): Processes six-channel input (three consecutive axial slices from T1 and three from T2). Encoder: pre-trained se_resnext50_32x4d; decoder: classical U-shape.
- LiteMedSAM (adapted Segment Anything): Utilizes a prompt encoder receiving a full-image bounding box, a ViT-based image encoder (LiteMedSAM weights), and cross-attention-based mask decoder.
All volumes were resampled to isotropic 1 mm³, intensity values clamped to the [0.05%, 99.95%] percentiles, and min–max normalized. Input grouping and cropping yielded central 256×256 patches. Data augmentation strategies (random crop, flips, rotation, gamma, brightness/contrast, blur, grid distortion; each with p=0.5) mimicked inter-study variation.
Training utilized the Adam optimizer (learning rate 1e-4, batch size 16, 5 epochs), with binary cross-entropy loss:
5. Quantitative Results, Stratified Performance, and Error Analysis
Primary evaluation used the Dice score: where and are the predicted and ground-truth masks, respectively.
| Model | Modality | Train Domain | STS Dice | MSTT-199 Dice |
|---|---|---|---|---|
| U-Net | MRI (T1/T2) | MSTT-199 | 0.79 | 0.68 |
| LiteMedSAM | MRI (T1/T2) | MSTT-199 | 0.80 | 0.67 |
| Multi-Branch UNet¹ | MRI+PET | STS | 0.77 | — |
¹Neubauer et al. (2020)
Stratified Dice scores (U-Net, MSTT-199 domain):
| Tissue Type | Dice |
|---|---|
| Fibrous | 0.507 |
| Fat | 0.744 |
| Myxoid | 0.823 |
| Nerve | 0.748 |
| Vascular | 0.584 |
| Average | 0.681 |
Models achieved best performance on large tumors (particularly myxoid/fat) and on the thigh/extremities (Dice ≈ 0.77). Performance was weakest for fibrous and vascular tumors (Dice 0.507 and 0.584), attributed to anatomical/size heterogeneity and low signal contrast. Outlier errors were observed for rare anatomical sites (flank, hand, chest wall).
MSTT-199-trained models outperformed prior art for MRI-based segmentation in cross-institutional evaluation (U-Net vs. Multi-Branch U-Net [Neubauer et al., 2020], 0.79 vs. 0.77 on STS).
6. Recommendations and Dataset Limitations
To maximize segmentation accuracy, developers are advised to employ both T1 and T2 modalities in a 2.5D input stack, resample to isotropic 1 mm³ voxels, and adopt robust geometric/intensity augmentations. Multi-plane inference (axial/sagittal/coronal) and test-time augmentation are recommended for further improvement. For anatomically small or low-contrast tumors, region-proposal or bounding-box prompts may increase model focus.
Notable limitations include lack of scanner standardization (field strength, vendor, slice thickness), substantial anatomical imbalance (extremities represented more than trunk, head/neck), and tissue imbalance (fewer challenging fibrous/vascular tumors). To address these, supplementing with additional cases, using intensity-standardization (Nyúl transform), bias-field correction (N4ITK), and self-supervised pretraining on unannotated datasets are suggested.
7. Outlook and Research Directions
MSTT-199’s diversity and detailed annotation pipeline validate its utility as both a primary training resource and pretraining domain for musculoskeletal tumor segmentation. Nonetheless, further extension to underrepresented organs and challenging tissue classes will enhance generalized performance. Incorporation of harmonization strategies, more granular stratification, and semi-supervised/self-supervised learning approaches represent promising directions for subsequent work (Reasat et al., 2024).