TikHarm: Multimodal WBC Imaging Dataset

Updated 29 November 2025

TikHarm Dataset is a multimodal resource integrating three bright-field channels and one fluorescence channel to advance white blood cell classification.
The dataset supports state-of-the-art techniques such as MM-MIMO with knowledge distillation, achieving F1 scores above 95% with reduced computational overhead.
It enables systematic evaluation of fusion strategies, model compression, and expert knowledge distillation, paving the way for efficient, clinically relevant cytological machine learning.

The TikHarm Dataset is a first-of-its-kind multimodal resource specifically designed for developing and benchmarking high-accuracy automated white blood cell (WBC, or leukocyte) classification systems. It integrates multiple imaging modalities—including three distinct bright-field channels and a fluorescence channel—captured from identical blood cell specimens, supporting advanced multimodal architectures and enabling systematic study of fusion strategies, model compression, and expert knowledge distillation. The dataset has provided the foundation for a state-of-the-art multimodal deep learning framework and offers unprecedented opportunities for robust, efficient, and clinically relevant cytological machine learning research (Yang et al., 2022).

1. Dataset Composition and Modalities

The TikHarm Dataset comprises meticulously curated paired images of individual WBCs, spanning the principal clinical classes:

Modalities:
- Bright-field (Color): Three channels (Color₁, Color₂, Color₃), each acquired under discrete illumination spectra, preserve independent morphological or staining information.
- Fluorescence (AO dye): A single fluorescence image per cell delivers additional functional and nuclear detail.
Scale and Class Distribution:
- Each modality contains 14,912 images (one per unique cell).
- Class breakdown: Neutrophil (9,616), Lymphocyte (4,448), Eosinophil (677), Monocyte (124), Basophil (47).

Preprocessing for all modalities involves center-cropping to 224×224 pixels, followed by RandAugment (N=2, M=18), and 5-fold cross-validation splitting.

2. Multimodal Fusion Benchmarks

The dataset supports rigorous evaluation of a spectrum of fusion paradigms:

Early Fusion: Stacks four matched-modality images along the channel axis, processed by a single backbone yielding fused predictions. Supervised with standard cross-entropy loss.
Late Fusion: Trains independent modality-specific backbones, averages their softmax outputs post-inference for ensemble decision-making.
MM-MIMO Framework: Embeds modality-specific subnetworks within a single backbone. Each subnetwork learns features for a single modality, but the architecture ensures computational efficiency by strict parameter sharing.

The MM-MIMO approach leverages the independence of modality-specific representations while eliminating the computational overhead of late fusion.

3. Knowledge Distillation Strategy

A principal innovation supported by TikHarm is cross-modal knowledge distillation:

Teacher-Student Setup: The high-capacity late-fusion ensemble acts as the teacher; the parameter-efficient MM-MIMO model is the student.
Distillation Objective:
- Soft targets are derived by temperature-scaled softmax, $p_s(z_j; T) = \exp(z_j/T)/\sum_k\exp(z_k/T)$ .
- The loss is a sum of cross-entropy on true labels and a Kullback–Leibler divergence on softened outputs:
$L_T = \sum_{i\in\{f,1,2,3\}} L_{CE}(y_i, p_{\theta}(y_i|x_i)) + T^2 \sum_{i} KL[p_s(\hat{z}_i; T) \| p_s(z_i; T)]$

with $T$ typically chosen in [2, 8].

This protocol enables the student network to approximate or surpass ensemble-level performance with drastically reduced complexity.

4. Model Architectures, Training, and Evaluation

Experiments within the TikHarm framework have utilized efficient CNN backbones (ShuffleNet V2 and ResNet-34), with comprehensive ablations on fusion and distillation methods:

Training Details:
- Data augmentation and 5-fold cross-validation as detailed above.
- Batch size and optimizer settings are selected to ensure 4-channel input fits GPU constraints.
Performance Metrics:
- Weighted F1-score, sensitivity (recall), specificity, and AUC, mean±std over 5 folds.
Key Results:
- Single best modality: F1 ≈95.3–95.5%.
- Late fusion surpasses early fusion by ~0.4–0.6% F1.
- MM-MIMO with cross-modal distillation achieves F1=95.99% (ShuffleNet V2) and F1=96.13% (ResNet-34), slightly surpassing late fusion, and matching or exceeding its sensitivity, specificity, and AUC.

5. Model Complexity and Computational Efficiency

A core deliverable of the TikHarm framework is a quantitative analysis of cost vs. accuracy tradeoffs:

Backbone	Variant	Complexity (FLOPs)	Params (M)	Relative Size	F1 (%)
ShuffleNet V2	Single-modality	147.8 M	1.26	1×	95.33
	Early fusion	172.2 M	1.26	1.2×	—
	Late fusion	591.2 M	5.04	3.4×	95.77
	MM-MIMO+KD	172.2 M	1.28	1.2×	95.99
ResNet-34	Single-modality	3,670 M	21.3	1×	95.48
	Early fusion	4,025 M	21.3	1.1×	—
	Late fusion	14,683 M	85.1	3.6×	96.04
	MM-MIMO+KD	4,025 M	21.3	1.1×	96.13

MM-MIMO+KD achieves state-of-the-art accuracy with an order-of-magnitude reduction in FLOPs and parameters compared to late fusion, and ≈4× faster inference.

6. Impact and Significance

The TikHarm Dataset represents the first public multimodal white blood cell imaging resource enabling end-to-end, low-complexity, and high-accuracy cytological classification (Yang et al., 2022). This dataset advances:

Standardization in multimodal WBC recognition, surpassing previous works confined to single-modality imaging.
Rapid prototyping and scalable deployment of models suitable for real-world, resource-constrained clinical settings.
New paradigms in model design, specifically the use of explicit modality-specialist subnetworks and cross-modal distillation for optimal efficiency–performance trade-off.

TikHarm sets a new technical baseline for multimodal hematological imaging and supports future research on explainable models, real-time diagnostics, and robust learning under variable imaging protocols.

Markdown Report Issue Upgrade to Chat

References (1)

Leukocyte Classification using Multimodal Architecture Enhanced by Knowledge Distillation (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TikHarm Dataset.