Papers
Topics
Authors
Recent
Search
2000 character limit reached

HemBLIP: VLM for Hematology Diagnostics

Updated 14 January 2026
  • HemBLIP is a family of vision-language models that integrates Vision Transformer and Transformer decoder for accurate cell morphology captioning and non-invasive haemoglobin estimation.
  • It employs both full fine-tuning and LoRA adaptations, reducing trainable parameters by over 95% and improving caption BLEU scores from 0.24 to 0.27.
  • The system supports clinical workflows with explainable outputs and mobile point-of-care screening, achieving an MAE of 0.85 g/dL for haemoglobin prediction.

HemBLIP denotes a family of vision-LLMs (VLMs) and mobile-health systems for interpretable hematological diagnostics, specifically designed to describe and quantify cell morphology for leukemia diagnosis and to enable non-invasive estimation of blood biomarkers such as haemoglobin. HemBLIP employs deep learning, structured expert annotation, and parameter-efficient adaptation techniques to deliver clinically relevant, explainable predictions, supporting both expert workflows and point-of-care applications (Logtestijn et al., 7 Jan 2026, Sarah et al., 2020).

1. Core Model Architecture and Adaptation

HemBLIP builds on the BLIP vision-language paradigm, integrating a Vision Transformer (ViT) as image encoder and a Transformer-based language decoder. The architecture is instantiated and fine-tuned for cell-level morphological captioning:

  • Image Encoder: ViT as proposed by Dosovitskiy et al. (2020), mapping high-resolution microscopy images to embeddings xRdx \in \mathbb{R}^d.
  • Language Decoder: Transformer decoder generates token sequences y=(y1,...,yT)y = (y_1, ..., y_T) corresponding to morphological descriptions.

Adaptation modalities include:

  • Full Fine-Tuning: All parameters updated to specialize for cell description generation.
  • LoRA Parameter-Efficient Adaptation: Applies Low-Rank Adaptation (LoRA), updating only low-rank matrices ARd×r,BRr×dA \in \mathbb{R}^{d \times r}, B \in \mathbb{R}^{r \times d} such that W=W+ABW' = W + AB, significantly reducing trainable parameters (∼95–99%) and computational requirements.

Pseudocode:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
for epoch in range(N):
    for image, caption in Train:
        z = ViT(image)
        y_hat = Decoder(z)
        L = CrossEntropy(y_hat, caption)
        L.backward()
        optimizer.step()

for epoch in range(N):
    for image, caption in Train:
        z = ViT(image)  # encoder frozen
        y_hat = Decoder(z, A, B)  # only LoRA active
        L = CrossEntropy(y_hat, caption)
        L.backward()
        optimizer.step()

The MedGEMMA baseline employs a SigLIP vision tower with medically aligned decoder for benchmarking (Logtestijn et al., 7 Jan 2026).

2. Dataset Construction and Annotation Protocols

HemBLIP leverages a combined morphology-rich dataset comprising 14,659 annotated peripheral blood cell images. The dataset consists of two main subsets:

  • WBCAtt (Healthy): 7,037 expertly annotated white blood cells across five morphologies. Eleven categorical attributes are encoded, encompassing nuclear shape, chromatin texture, nucleoli, cytoplasmic indicators, granular features, cell size, and others.
  • LeukemiaAttri (Leukemic): 7,622 cells spanning acute (ALL, AML, APML) and chronic (CLL, CML) leukemia types. Seven attributes, including Auer rods and nucleus-to-cytoplasm ratio, are specified.

Each image receives an expert-derived caption structured around a fixed protocol: “A [size] cell with [chromatin texture] chromatin, [nucleoli description], [cytoplasmic amount], [granularity], [diagnosis if obvious].” The vocabulary consists of ~120 tokens with mean caption length ≈18 words.

Class distributions (training):

Healthy Leukemic Cell Size (S/M/L) Chromatin (Coarse/Fine) Leukemia Subtypes
% train samples ~50 ~50 23 / 54 / 23 49 / 51 20/25/15/20/20

The dataset split is 80% train, 10% validation, 10% internal test.

3. Training Objectives and Optimization Strategies

HemBLIP adopts multi-task objectives:

  • Caption Generation Loss:

Lcaption=t=1TlogP(yty<t,x)L_\text{caption} = -\sum_{t=1}^T \log P(y_t \mid y_{<t}, x)

  • Morphological Attribute Classification (Auxiliary):

Lattr=k=1KyklogykL_\text{attr} = -\sum_{k=1}^K y^*_k \log y_k

  • Combined Objective:

L=λcapLcaption+λmorphLattrL = \lambda_{cap} L_\text{caption} + \lambda_{morph} L_\text{attr}

With typical λcap=1.0\lambda_{cap}=1.0, λmorph=0.5\lambda_{morph}=0.5.

Optimization is performed via AdamW (learning rate 5×1055 \times 10^{-5}), employing early stopping on validation loss.

4. Evaluation Metrics and Quantitative Results

HemBLIP is evaluated using both natural language and morphological accuracy metrics:

  • Caption Quality:
    • BLEU-1…4: n-gram overlap measure.
    • ROUGE-L: longest common subsequence (LCS).
    • BERTScore: cosine similarity of token-level BERT embeddings.
  • Morphological Feature Accuracy:
    • Regex-driven attribute extraction from generated captions; accuracy computed as fraction of correct matches over ground-truth mentions.

Performance results (internal test):

Model BLEU ROUGE-L BERTScore Cell Size (%) Chromatin Texture (%) Cytoplasm Amount (%) Diagnosis Mention (%)
HemBLIP Full 0.24 0.42 0.83 91.8 52.7 70.4 79.1
HemBLIP LoRA 0.27 0.49 0.86 85.4 55.5 69.9 82.3
MedGEMMA LoRA 0.31 0.52 0.87 81.2 59.3 57.4 54.1

LoRA adaptation trains ≈1.5M parameters (versus 180M full), decreasing GPU memory by ∼4× and training time by ∼3×, with caption BLEU rising from 0.24 to 0.27 while reducing costs by >95%.

5. Interpretability, Clinical Rationale, and Example Outputs

HemBLIP produces “explainable-by-design” outputs that enumerate key cytological attributes (nuclear shape, chromatin texture, nucleoli, cytoplasm features), mirroring hematologist reasoning and facilitating transparent downstream classification.

Example:

  • Input: medium-sized blast cell
  • Reference: “A medium cell with open chromatin, two prominent nucleoli, scant agranular cytoplasm; consistent with acute lymphoblastic leukemia.”
  • HemBLIP LoRA output: “A medium-sized round cell showing open, fine chromatin, visible prominent nucleoli, minimal clear cytoplasm—highly suggestive of a lymphoblast.”

Clinical implications include support for pathologist review, increased trust in automated systems, and utility as a teaching aid in resource-limited environments.

6. HemBLIP for Non-Invasive Haemoglobin Estimation

An independent mobile-health workflow extends the HemBLIP paradigm to non-invasive Hb measurement via smartphone-based multi-input analysis (Sarah et al., 2020):

  • Client App: Guides sequential capture of fingernail beds, palpebral conjunctiva, and tongue. Implements on-device OCR/NLP to extract CBC report values for supervised calibration.
  • Image Preprocessing: Illumination correction (RGB→YCbCr; CLAHE), white reference calibration using sclera pixels, ROI segmentation via SLIC and UNet-style architectures (encoder: 16–128 channels, separable conv, ReLU, batch norm).
  • Feature Extraction: Computes channel-wise statistics (mean, std, skewness, kurtosis), colour ratios (R/G, G/B), erythema index, CIELab and HSV parameters, and morphometric features.
  • Machine Learning Pipeline: Two-stage approach: (1) anaemia-severity classifier (Random Forest, SVM), (2) regression model (XGBoostRegressor, multi-linear). End-to-end CNN+MLP models deployed as alternatives.
  • Calibration: Per-subject linear correction (Hbcal=αjf+βjHb_cal = \alpha_j f + \beta_j) fit via least squares, improves individualized accuracy.
  • Performance: MAE = 0.85 g/dL, RMSE = 1.10 g/dL, R2=0.88R^2 = 0.88 on cross-validated cohort (120 subjects); multi-ROI outperforming single-ROI (paired t-test p<0.01p<0.01).

Deployment uses quantized (8-bit) models for on-device inference and integrates a cloud backend for active learning and calibration storage.

7. Limitations and Prospective Extensions

Current HemBLIP models are constrained by dataset composition (peripheral blood smears; exclusion of bone marrow aspirates and multi-stain modalities), domain shift effects (external test ROUGE-L drops to ~0.25), and baseline demographic integration.

Potential advancements include:

  • Expanding datasets to more tissue types and stains.
  • Incorporating patient metadata (age, sex, lab context).
  • Refining mobile workflows for diverse skin pigmentation and ambient lighting conditions.
  • Exploring federated learning for privacy-preserving continual model refinement.
  • Integrating low-cost illumination and higher-resolution lens attachments for enhanced acquisition.

This suggests HemBLIP systems can be generalized for scalable point-of-care hematology, with explainable outputs bridging the gap between automated analysis and expert interpretation in leukemia diagnosis and anaemia screening (Logtestijn et al., 7 Jan 2026, Sarah et al., 2020).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HemBLIP.