Papers
Topics
Authors
Recent
Search
2000 character limit reached

DERM12345 Dermatoscopic Imaging Dataset

Updated 25 January 2026
  • DERM12345 is a comprehensive, multi-institutional dataset featuring 12,345 high-resolution dermatoscopic images annotated via a four-tier hierarchical diagnostic taxonomy.
  • The dataset supports a range of tasks from binary malignancy detection to 40-way fine-grained subclass classification, enabling rigorous evaluation of machine learning models.
  • Collected over 12 years across tertiary clinics in Türkiye, it includes expert annotations, stratified splits, and preprocessing guidelines for robust deep learning applications.

The DERM12345 dataset is a multi-institutional dermatoscopic image resource comprising 12,345 high-resolution images of skin lesions, annotated across a hierarchical diagnostic taxonomy. Designed to enable nuanced evaluation of machine learning models in dermatological imaging, DERM12345 supports both coarse malignancy detection and fine-grained differential diagnosis with 40 annotated lesion subclasses. Data collection spanned 12 years across tertiary clinics in Türkiye, encompassing a diverse patient population and imaging modalities. This dataset is established as the basis for advanced benchmarking, including hierarchical model evaluation frameworks in recent foundation model studies (Yilmaz et al., 2024, Yuceyalcin et al., 18 Jan 2026).

1. Dataset Structure and Taxonomy

DERM12345 contains dermatoscopic photographs—each corresponding to a unique lesion—sampled from 2008–2020 in Manisa and Istanbul. The image resolutions are device-dependent (ranging from 2000 × 1500 to 3840 × 2160 pixels), requiring downstream resizing for deep learning pipelines.

Images are annotated according to a four-tier clinical taxonomy:

Level # Labels Examples
Subclass 40 Acral Nodular Melanoma, Seborrheic Keratosis, BCC, SCC
Main Class 15 Compound Nevus, Melanoma
Superclass (4 types) 4 Melanocytic Benign/Malignant, Non-melanocytic Benign/Malignant
Binary Malignancy 2 Malignant, Benign

The subclass definitions enable fine-grained tasks crucial for differential diagnosis, moving beyond traditional binary (malignant vs benign) paradigms. These tiers underpin the hierarchical benchmarking protocols instituted in downstream studies (Yuceyalcin et al., 18 Jan 2026).

2. Data Acquisition and Sources

Image acquisition utilized a combination of MoleMax HD, FotoFinderĀ® videodermatoscopes, and 3Gen DermLite DL4 devices attached to either mobile or DSLR cameras. All cases originated from clinical workflows in three dermatology centers:

  • Celal Bayar University, Manisa
  • Istinye University – Liv Hospital Vadistanbul, Istanbul
  • University of Health Sciences – Haydarpaşa Numune Hospital, Istanbul

No publicly available datasets were included, ensuring patient diversity typical of the Europe–Asia transition zone (Fitzpatrick skin types II–IV). Modalities included both polarized and non-polarized dermoscopy.

3. Annotation Protocol and Expert Agreement

Initial annotation combined automated extraction and manual metadata review, performed by trained engineers. Board-certified dermatologists with over 20 years of dermoscopy experience (G.G., S.P.Y.) rendered consensus diagnoses on all cases. Malignant lesions were mandatorily biopsy-confirmed; benign and dysplastic diagnoses were validated by either ≄ 2 years of digital follow-up or clinical consensus.

Discrepancies were adjudicated directly; a formal Cohen’s Īŗ metric calculation is possible by random subsampling:

Īŗ=poāˆ’pe1āˆ’pe\kappa = \frac{p_o - p_e}{1 - p_e}

where pop_o and pep_e denote observed and expected agreement, respectively.

4. Metadata Schema and File Format

All images are stored in JPEG (.jpg, 8-bit) with standardized naming:

DERM38_<CenterCode><DeviceCode><PatientID>_<ImageID>.jpg

Metadata is maintained in CSV, with these principal fields:

  1. file_name
  2. super_class
  3. main_class
  4. subclass_label
  5. patient_id (anonymized)
  6. lesion_location (e.g., ā€œdorsum of handā€)
  7. device_type (e.g., ā€œMoleMaxHDā€)
  8. capture_date
  9. biopsy_confirmed (yes/no) 10. follow_up_months

This rich schema supports flexible downstream stratification and cohort selection.

5. Data Splitting and Preprocessing

A stratified data partition is recommended to preserve subclass frequency distribution. Commonly implemented splits are:

  • Training: 70% of images
  • Validation: 15%
  • Test: 15%

Alternatively, dedicated benchmarks employ a 9,860/2,485 split (train/test) with five-fold cross-validation on the training set (stratification by subclass label) (Yuceyalcin et al., 18 Jan 2026).

Preprocessing steps for deep learning include resizing images to target resolutions (e.g., 224Ɨ224 or 256Ɨ256), followed by normalization (ImageNet mean/std or model-specific constants). During benchmarking, embeddings are precomputed from fixed crops; no real-time augmentation or cropping is utilized at the embedding step.

6. Hierarchical Benchmarking and Evaluation

DERM12345 underpins a four-level hierarchical evaluation, as formalized in recent foundation model studies (Yuceyalcin et al., 18 Jan 2026). Models are trained at the finest subclass level (40-way), with aggregate predictions derived via probability summation over child subclasses per parent label:

pParent=āˆ‘s∈{ChildĀ Subclasses}psp_{\text{Parent}} = \sum_{s \in \{\text{Child Subclasses}\}} p_{s}

Evaluation metrics are:

  • Primary: Weighted F1-Score

WeightedĀ F1=āˆ‘i=1Cwi 2 Precisioni RecalliPrecisioni+Recalli\text{Weighted F1} = \sum_{i=1}^{C} w_{i}\, \frac{2\,\text{Precision}_i\,\text{Recall}_i}{\text{Precision}_i + \text{Recall}_i}

where CC is the number of classes and wiw_{i} is the normalized support.

Key benchmark findings include the "granularity gap": general-purpose medical vision models (e.g., MedImageInsights) achieve high binary screening accuracy (97.52% Weighted F1) but lower subclass discrimination (~65.5%), while models pretrained for dermatology (e.g., Derm Foundation, MedSigLip, MONET) reach higher subclass accuracy (~69.5%) yet trail in coarse labels.

7. Code Examples and Practical Usage

PyTorch and TensorFlow data pipeline implementations are provided to facilitate robust usage:

PyTorch Sample:

1
2
3
4
5
6
7
8
9
10
from sklearn.model_selection import train_test_split
import pandas as pd

meta = pd.read_csv("DERM38_metadata.csv")
train_df, temp_df = train_test_split(
    meta, test_size=0.30, stratify=meta['subclass_label'], random_state=42
)
val_df, test_df = train_test_split(
    temp_df, test_size=0.50, stratify=temp_df['subclass_label'], random_state=42
)
The complete dataset can be accessed (under planned CC BY 4.0 license) via Zenodo DOI link. Sensitive additional fields (e.g., age, sex) are available upon request from dataset authors.

8. Key Insights and Limitations

DERM12345 exhibits significant class imbalance: benign nevi (Compound, Junctional) comprise ~40% of cases, while rare malignancies individually constitute <1%. Notable subclass confusions—such as between dysplastic and banal compound nevi ("blob problem")—persist even in top-performing models, with ≄25% embedding overlap observed.

For clinical applications, coarse grain tasks (binary, superclass) are approachable by general medical vision encoders, but fine-grained classification (40-way) demonstrably requires specialized dermatology pretraining or large-scale medical embedding strategies. Diverse adapter architectures (MLP, XGBoost, SVM) are critical to assess representation quality; MLP adapters consistently yield optimal results for fine classification.

A plausible implication is that model selection in algorithmic dermatology must be domain- and task-specific, matching representation strategy to diagnostic granularity demanded by the target workflow.


DERM12345 represents a rigorous, hierarchically structured, and expertly annotated benchmark resource for dermatologic machine learning, facilitating standardized evaluation across the full spectrum of clinical diagnostic tasks (Yilmaz et al., 2024, Yuceyalcin et al., 18 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DERM12345 Dataset.