DERM12345 Dermatoscopic Imaging Dataset

Updated 25 January 2026

DERM12345 is a comprehensive, multi-institutional dataset featuring 12,345 high-resolution dermatoscopic images annotated via a four-tier hierarchical diagnostic taxonomy.
The dataset supports a range of tasks from binary malignancy detection to 40-way fine-grained subclass classification, enabling rigorous evaluation of machine learning models.
Collected over 12 years across tertiary clinics in Türkiye, it includes expert annotations, stratified splits, and preprocessing guidelines for robust deep learning applications.

The DERM12345 dataset is a multi-institutional dermatoscopic image resource comprising 12,345 high-resolution images of skin lesions, annotated across a hierarchical diagnostic taxonomy. Designed to enable nuanced evaluation of machine learning models in dermatological imaging, DERM12345 supports both coarse malignancy detection and fine-grained differential diagnosis with 40 annotated lesion subclasses. Data collection spanned 12 years across tertiary clinics in Türkiye, encompassing a diverse patient population and imaging modalities. This dataset is established as the basis for advanced benchmarking, including hierarchical model evaluation frameworks in recent foundation model studies (Yilmaz et al., 2024, Yuceyalcin et al., 18 Jan 2026).

1. Dataset Structure and Taxonomy

DERM12345 contains dermatoscopic photographs—each corresponding to a unique lesion—sampled from 2008–2020 in Manisa and Istanbul. The image resolutions are device-dependent (ranging from 2000 × 1500 to 3840 × 2160 pixels), requiring downstream resizing for deep learning pipelines.

Images are annotated according to a four-tier clinical taxonomy:

Level	# Labels	Examples
Subclass	40	Acral Nodular Melanoma, Seborrheic Keratosis, BCC, SCC
Main Class	15	Compound Nevus, Melanoma
Superclass (4 types)	4	Melanocytic Benign/Malignant, Non-melanocytic Benign/Malignant
Binary Malignancy	2	Malignant, Benign

The subclass definitions enable fine-grained tasks crucial for differential diagnosis, moving beyond traditional binary (malignant vs benign) paradigms. These tiers underpin the hierarchical benchmarking protocols instituted in downstream studies (Yuceyalcin et al., 18 Jan 2026).

2. Data Acquisition and Sources

Image acquisition utilized a combination of MoleMax HD, FotoFinder® videodermatoscopes, and 3Gen DermLite DL4 devices attached to either mobile or DSLR cameras. All cases originated from clinical workflows in three dermatology centers:

Celal Bayar University, Manisa
Istinye University – Liv Hospital Vadistanbul, Istanbul
University of Health Sciences – Haydarpaşa Numune Hospital, Istanbul

No publicly available datasets were included, ensuring patient diversity typical of the Europe–Asia transition zone (Fitzpatrick skin types II–IV). Modalities included both polarized and non-polarized dermoscopy.

3. Annotation Protocol and Expert Agreement

Initial annotation combined automated extraction and manual metadata review, performed by trained engineers. Board-certified dermatologists with over 20 years of dermoscopy experience (G.G., S.P.Y.) rendered consensus diagnoses on all cases. Malignant lesions were mandatorily biopsy-confirmed; benign and dysplastic diagnoses were validated by either ≥ 2 years of digital follow-up or clinical consensus.

Discrepancies were adjudicated directly; a formal Cohen’s κ metric calculation is possible by random subsampling:

$\kappa = \frac{p_o - p_e}{1 - p_e}$

where $p_o$ and $p_e$ denote observed and expected agreement, respectively.

4. Metadata Schema and File Format

All images are stored in JPEG (.jpg, 8-bit) with standardized naming:

DERM38_<CenterCode><DeviceCode><PatientID>_<ImageID>.jpg

Metadata is maintained in CSV, with these principal fields:

file_name
super_class
main_class
subclass_label
patient_id (anonymized)
lesion_location (e.g., “dorsum of hand”)
device_type (e.g., “MoleMaxHD”)
capture_date
biopsy_confirmed (yes/no) 10. follow_up_months

This rich schema supports flexible downstream stratification and cohort selection.

5. Data Splitting and Preprocessing

A stratified data partition is recommended to preserve subclass frequency distribution. Commonly implemented splits are:

Training: 70% of images
Validation: 15%
Test: 15%

Alternatively, dedicated benchmarks employ a 9,860/2,485 split (train/test) with five-fold cross-validation on the training set (stratification by subclass label) (Yuceyalcin et al., 18 Jan 2026).

Preprocessing steps for deep learning include resizing images to target resolutions (e.g., 224×224 or 256×256), followed by normalization (ImageNet mean/std or model-specific constants). During benchmarking, embeddings are precomputed from fixed crops; no real-time augmentation or cropping is utilized at the embedding step.

6. Hierarchical Benchmarking and Evaluation

DERM12345 underpins a four-level hierarchical evaluation, as formalized in recent foundation model studies (Yuceyalcin et al., 18 Jan 2026). Models are trained at the finest subclass level (40-way), with aggregate predictions derived via probability summation over child subclasses per parent label:

$p_{\text{Parent}} = \sum_{s \in \{\text{Child Subclasses}\}} p_{s}$

Evaluation metrics are:

Primary: Weighted F1-Score

$\text{Weighted F1} = \sum_{i=1}^{C} w_{i}\, \frac{2\,\text{Precision}_i\,\text{Recall}_i}{\text{Precision}_i + \text{Recall}_i}$

where $C$ is the number of classes and $w_{i}$ is the normalized support.

Secondary: Balanced Accuracy (identifies failures on rare subclasses).

Key benchmark findings include the "granularity gap": general-purpose medical vision models (e.g., MedImageInsights) achieve high binary screening accuracy (97.52% Weighted F1) but lower subclass discrimination (~65.5%), while models pretrained for dermatology (e.g., Derm Foundation, MedSigLip, MONET) reach higher subclass accuracy (~69.5%) yet trail in coarse labels.

7. Code Examples and Practical Usage

PyTorch and TensorFlow data pipeline implementations are provided to facilitate robust usage:

PyTorch Sample:

from sklearn.model_selection import train_test_split
import pandas as pd

meta = pd.read_csv("DERM38_metadata.csv")
train_df, temp_df = train_test_split(
    meta, test_size=0.30, stratify=meta['subclass_label'], random_state=42
)
val_df, test_df = train_test_split(
    temp_df, test_size=0.50, stratify=temp_df['subclass_label'], random_state=42
)

The complete dataset can be accessed (under planned CC BY 4.0 license) via Zenodo DOI link. Sensitive additional fields (e.g., age, sex) are available upon request from dataset authors.

8. Key Insights and Limitations

DERM12345 exhibits significant class imbalance: benign nevi (Compound, Junctional) comprise ~40% of cases, while rare malignancies individually constitute <1%. Notable subclass confusions—such as between dysplastic and banal compound nevi ("blob problem")—persist even in top-performing models, with ≥25% embedding overlap observed.

For clinical applications, coarse grain tasks (binary, superclass) are approachable by general medical vision encoders, but fine-grained classification (40-way) demonstrably requires specialized dermatology pretraining or large-scale medical embedding strategies. Diverse adapter architectures (MLP, XGBoost, SVM) are critical to assess representation quality; MLP adapters consistently yield optimal results for fine classification.

A plausible implication is that model selection in algorithmic dermatology must be domain- and task-specific, matching representation strategy to diagnostic granularity demanded by the target workflow.

DERM12345 represents a rigorous, hierarchically structured, and expertly annotated benchmark resource for dermatologic machine learning, facilitating standardized evaluation across the full spectrum of clinical diagnostic tasks (Yilmaz et al., 2024, Yuceyalcin et al., 18 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (2)

DERM12345: A Large, Multisource Dermatoscopic Skin Lesion Dataset with 38 Subclasses (2024)

A Hierarchical Benchmark of Foundation Models for Dermatology (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to DERM12345 Dataset.

DERM12345 Dermatoscopic Imaging Dataset

1. Dataset Structure and Taxonomy

2. Data Acquisition and Sources

3. Annotation Protocol and Expert Agreement

4. Metadata Schema and File Format

5. Data Splitting and Preprocessing

6. Hierarchical Benchmarking and Evaluation

7. Code Examples and Practical Usage

8. Key Insights and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

DERM12345 Dermatoscopic Imaging Dataset

1. Dataset Structure and Taxonomy

2. Data Acquisition and Sources

3. Annotation Protocol and Expert Agreement

4. Metadata Schema and File Format

5. Data Splitting and Preprocessing

6. Hierarchical Benchmarking and Evaluation

7. Code Examples and Practical Usage

8. Key Insights and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research