Influence of training data modality, size, and diversity on segmentation performance

Quantify the influence of training data modalities, dataset size, and data diversity on the performance of segmentation foundation models for microscopy instance segmentation across varied benchmarks.

Background

Training data composition is believed to strongly impact the effectiveness and generalization of segmentation foundation models. Microscopy datasets vary widely in modality (e.g., fluorescence, label-free, histopathology), size, and diversity, which may affect model performance.

By evaluating many models on 36 datasets, the paper highlights the importance of systematically characterizing how training data properties drive performance, but a precise quantification remains to be established.

References

These developments open up the following questions: (iii) What influence does the training data (modalities, size, data diversity) have on model performance?

Revisiting foundation models for cell instance segmentation  (2603.17845 - Archit et al., 18 Mar 2026) in Section 1, Introduction