Automated HER2 Scoring in Diagnostics

Updated 2 February 2026

Automated HER2 scoring is a computational method that quantifies HER2 expression in tissue samples using digital image analysis across IHC, FISH, and H&E modalities.
It leverages deep learning, handcrafted features, and multi-modal fusion to mimic clinical scoring guidelines and reduce observer variability.
The approach integrates preprocessing, feature extraction, and postprocessing steps to reliably classify HER2 status, supporting biomarker-driven therapy decisions.

Automated HER2 scoring refers to computational methods for quantifying human epidermal growth factor receptor 2 (HER2) expression in histological or cytological samples, most commonly for breast cancer diagnostics. The central objective is objective, reproducible assignment of standardized HER2 scores—typically 0, 1+, 2+, or 3+ per ASCO/CAP guidelines—directly from digitized images of immunohistochemically stained (IHC), fluorescence in situ hybridization (FISH), or, increasingly, hematoxylin and eosin (H&E) tissue slides. Automated systems address the substantial inter- and intra-observer variability inherent in manual pathology review, and are essential for triaging, biomarker-driven therapy, and large-scale standardized diagnostics.

1. Foundations and Modalities of Automated HER2 Scoring

Automated HER2 scoring workflows are implemented using digital pathology inputs, subdivided by staining modality: IHC, FISH, and H&E. Each modality presents distinct algorithmic requirements and clinical challenges.

IHC-based automation focuses on membrane staining intensity and completeness, mimicking manual scoring criteria. Representative systems use classical image analysis, handcrafted features, or deep learning, often trained on digitized whole-slide images or tissue microarrays. High-performing deep learning methods encode slide context, perform multiscale analysis, and provide patch- or cell-level predictions followed by aggregation and guideline-compliant postprocessing (Selcuk et al., 2024, Fanous et al., 2024, Qaiser et al., 2017, Pham et al., 2022).
FISH-based automation involves automated detection of nuclei, followed by quantification and classification of HER2 and reference signals (e.g., CEP17) in each nucleus, and finally applying per-guideline ratio thresholds to obtain amplification status. Fully automated FISH pipelines integrate instance segmentation (StarDist, U-Net variants), object detection for signal counting (RetinaNet), and interpretable reporting visualizations to match pathologist workflows (Schmell et al., 2020).
H&E-based and cross-modal automation infers HER2 status directly from routinely available H&E slides, bypassing the need for IHC/ISH. These methods exploit transfer learning, attention-based multiple-instance learning (MIL), large-scale weak supervision, and, in some cases, generative adversarial networks (GANs) to synthesize virtual IHC (Abdulsadig et al., 2024, Conde-Sousa et al., 2021, Rehmat et al., 23 Jun 2025). Joint H&E/IHC ViT pipelines and flexible fusion approaches support additional interpretability and robustness (Oyelade et al., 26 Dec 2025, Qin et al., 12 Apr 2025).

2. Pipeline Architectures and Computational Strategies

Automated workflows vary in degree of system integration, architectural complexity, and adherence to pathologist guidelines.

Preprocessing and ROI Detection:
- Common steps include color deconvolution (extracting DAB and hematoxylin), tissue segmentation, artifact removal, and slide-level tiling or dynamic field-of-view (FOV) selection.
- Interactive microscope-based systems permit ROI selection and threshold adjustment by pathologists (Zhang et al., 2021, Zhang et al., 2020).
- End-to-end methods incorporate automated ROI or tissue core localization (e.g., Hough-based core detection or deep CNN segmentation) for batch processing of TMAs and WSIs (Selcuk et al., 2024, Jha et al., 2024).
Feature Extraction and Classification:
- Early methods rely on handcrafted color, intensity, and texture features, sometimes feeding simple classifiers (kNN, SVM, MLP, Trees) in a two-stage fashion—classify per-patch, then aggregate at the slide (Cordeiro et al., 2018).
- Contemporary deep learning approaches employ CNNs (DenseNet-201, EfficientNet-B0/ConvNeXt, custom residual/LSTM hybrids) or ViTs, leveraging pyramid sampling, multiple scales, and attention to capture both global and subcellular detail (Selcuk et al., 2024, Chauhan et al., 28 Mar 2025, Oyelade et al., 26 Dec 2025).
- MIL frameworks aggregate patch-or region-level embeddings via attention, supporting weak supervision and interpretability in the absence of precise annotation (Abdulsadig et al., 2024, Conde-Sousa et al., 2021).
- Segmentation/refinement-based systems enforce per-guideline constraints on patch/tumor-surface percentages and allow fine-tuning at the output logit stage for guideline compliance (Pham et al., 2022).
Aggregation and Postprocessing:
- Most systems follow formal aggregation rules: e.g., compute per-class proportions, select maximum surface area classes, or apply logical thresholds to percentages of labeled invasive tumor area in accordance with ASCO/CAP rules for HER2 positivity, negativity, and equivocal status (Zhang et al., 2021, Pham et al., 2022, Jha et al., 2024).
- Confidence estimation and uncertainty quantification (via Bayesian MC dropout or consensus-based filtering) are increasingly incorporated to support result reliability, identifying cases that require pathologist review (Shen et al., 26 Jan 2026, Fanous et al., 2024).

3. Representative Automated Scoring Paradigms and Performance

A selection of leading approaches and their technical details are summarized below.

Modality	Pipeline Highlights	Slide/ Core-Level Accuracy	Notable Innovations
IHC (TMA)	Pyramid sampling + DenseNet-201 (Selcuk et al., 2024)	84.7% 4-class	Multi-scale sampling, confidence-based aggregation
IHC (blurred)	BlurryScope + eFIN (Fanous et al., 2024)	79.3% (4-cl); 89.7% (2-cl)	Low-cost hardware, real-time, blur-robust DL
IHC (lensfree)	Lensfree holography + EfficientNet-B0 (Shen et al., 26 Jan 2026)	84.9% (4-cl); 94.8% (2-cl)	Portable, Bayesian uncertainty filtering
IHC	Vision Transformer + MaskFormer (Oyelade et al., 26 Dec 2025)	94% 4-class	Pixel-level scoring, H&E/IHC joint mapping
H&E	MIL w/ attention + PCAM TL (Abdulsadig et al., 2024)	AUC-ROC = 0.62 (macro)	Weakly supervised MIL, attention-based localization
H&E/IHC	Bi-modal ViT + GAN fusion (Qin et al., 12 Apr 2025)	95.1% dual, 94.3% HE-only	Dynamic branch, context-conditional GAN
FISH	U-Net + VGG + RetinaNet (Schmell et al., 2020)	>90% agreement	End-to-end interpretable, per-nucleus signal count

Details reflect best-reported accuracy on held-out test data or consensus-matched validation cohorts.

Systems targeting clinical deployment (e.g., (Jha et al., 2024, Zhang et al., 2020)) emphasize full slide automation, artifact detection, multicenter validation, and integration of pathologist-in-the-loop interfaces, achieving clinical agreement up to ≥92%.

4. Validation, Performance Evaluation, and Limitations

Automated HER2 scoring systems are assessed using standard classification metrics (accuracy, macro-AUC, F1, precision, recall, specificity) at both slide and core levels. Robust studies report per-class confusion matrices, correction/loss rates after confidence filtering, and explicit handling of equivocal (2+) classifications (Qaiser et al., 2017, Selcuk et al., 2024, Shen et al., 26 Jan 2026).

Challenge datasets (e.g., Her2 Challenge (Qaiser et al., 2017), HEROHE (Conde-Sousa et al., 2021)) provide head-to-head benchmarks, confirming that leading algorithms approach and sometimes exceed expert pathologist performance, especially in discriminating clear positive (3+) and negative (0/1+) cases.
Inter-observer variability is mitigated by fully automated or semi-automated scoring, with observed improvements in pathologist consensus and reproducibility (Pham et al., 2022, Jha et al., 2024).
Limitations remain regarding handling of staining artifacts, slide-to-slide and scanner variation, equivocal class discrimination, and heterogeneous tumor regions. Hardware-limited systems may trade resolution for speed and cost, and generalization beyond development cohorts often necessitates further data and algorithmic domain adaptation (Fanous et al., 2024, Shen et al., 26 Jan 2026).

5. Interpretability, Explainability, and Pathologist Integration

A critical focus for clinical translation is interpretability.

Guideline-compliant scoring: Rule-based and constrained deep-learning pipelines encode ASCO/CAP criteria directly, reporting tumor-surface percentages for each HER2 class and enforcing compliance at the loss or postprocessing stages (Pham et al., 2022, Zhang et al., 2021, Jha et al., 2024).
Intermediate visualization: Systems overlay cell detection, membrane contours, and class heatmaps onto FOVs or WSIs, providing real-time, color-coded audit trails for each decision (Zhang et al., 2020, Abdulsadig et al., 2024).
Attention maps and confidence scores: Attention-based MIL and reinforcement agents furnish spatial maps and trajectory visualizations, supporting pathologist review, triage of equivocal cases, and identification of unreliable predictions (Abdulsadig et al., 2024, Qaiser et al., 2019, Shen et al., 26 Jan 2026).

Clinical integration is further enhanced by real-time AR microscope overlays, interactive ROI adjustment, threshold sliders, and support for pathologist-assisted selection or correction, allowing semi-automatic operation in high-throughput labs (Zhang et al., 2020, Zhang et al., 2021).

6. Technical Trends and Prospects

Recent advances highlight several converging directions:

Cost-effective and deployable platforms: Innovations in compact opto-mechanical design (BlurryScope, lensfree holography) and computational imaging shift high-fidelity scoring from centralized facilities to point-of-care, resource-limited clinics (Fanous et al., 2024, Shen et al., 26 Jan 2026).
Scalable annotation/low-supervision: Use of MIL, domain-adapted transfer learning, dynamic patch sampling, and GAN-based virtual IHC creation reduce dependence on pixel-level annotation and expensive immunostains (Abdulsadig et al., 2024, Rehmat et al., 23 Jun 2025, Qin et al., 12 Apr 2025).
Multi-modal fusion: Flexible hybrid pipelines, cross-modal feature reconstruction, and joint H&E–IHC scoring support robust results when facing incomplete or discordant modalities (Qin et al., 12 Apr 2025, Oyelade et al., 26 Dec 2025).
Future directions: Extrapolation includes extension to other stains or tissue types, incorporation of uncertainty quantification and hallucination detection, domain adaptation for scanner variation, and integration with downstream oncology decision-support tools. The need for large, multi-institutional, and balanced datasets for rigorous benchmarking is repeatedly emphasized (Selcuk et al., 2024, Shen et al., 26 Jan 2026).

7. Clinical and Research Implications

Automated HER2 scoring is positioned to enhance diagnostic precision, reduce turnaround time, and support consistent triage for HER2-targeted therapy eligibility. In settings lacking pathology expertise, these methods democratize high-quality cancer diagnosis and are extensible to other molecular biomarkers. For computational pathology research, automated HER2 scoring serves as a domain prototype for interpretable, regulatory-compliant biomarker prediction workflows and catalyzes methodological innovation in explainable AI, multi-modal modeling, and uncertainty-aware prediction.

References:

(Fanous et al., 2024, Selcuk et al., 2024, Abdulsadig et al., 2024, Schmell et al., 2020, Conde-Sousa et al., 2021, Chauhan et al., 28 Mar 2025, Cordeiro et al., 2018, Qaiser et al., 2019, Zhang et al., 2020, Pham et al., 2022, Shen et al., 26 Jan 2026, Jha et al., 2024, Oyelade et al., 26 Dec 2025, Qaiser et al., 2017, Qin et al., 12 Apr 2025, Rehmat et al., 23 Jun 2025).