Self-learning for weakly supervised Gleason grading of local patterns

Published 21 May 2021 in eess.IV and cs.CV | (2105.10420v1)

Abstract: Prostate cancer is one of the main diseases affecting men worldwide. The gold standard for diagnosis and prognosis is the Gleason grading system. In this process, pathologists manually analyze prostate histology slides under microscope, in a high time-consuming and subjective task. In the last years, computer-aided-diagnosis (CAD) systems have emerged as a promising tool that could support pathologists in the daily clinical practice. Nevertheless, these systems are usually trained using tedious and prone-to-error pixel-level annotations of Gleason grades in the tissue. To alleviate the need of manual pixel-wise labeling, just a handful of works have been presented in the literature. Motivated by this, we propose a novel weakly-supervised deep-learning model, based on self-learning CNNs, that leverages only the global Gleason score of gigapixel whole slide images during training to accurately perform both, grading of patch-level patterns and biopsy-level scoring. To evaluate the performance of the proposed method, we perform extensive experiments on three different external datasets for the patch-level Gleason grading, and on two different test sets for global Grade Group prediction. We empirically demonstrate that our approach outperforms its supervised counterpart on patch-level Gleason grading by a large margin, as well as state-of-the-art methods on global biopsy-level scoring. Particularly, the proposed model brings an average improvement on the Cohen's quadratic kappa (k) score of nearly 18% compared to full-supervision for the patch-level Gleason grading task.

Abstract PDF Upgrade to Chat

Citations (32)

View on Semantic Scholar

Summary

The paper presents a weakly supervised self-learning framework that leverages global Gleason scores to generate patch-level pseudo-labels.
It employs a teacher-student CNN architecture with multiple-instance learning and max-pooling, achieving a Cohen's quadratic kappa of 0.82 on patch-level grading.
The approach reduces reliance on detailed pixel-level annotations, offering increased generalizability and robustness in computational pathology.

Self-learning for Weakly Supervised Gleason Grading of Local Patterns

Introduction

The paper "Self-learning for weakly supervised Gleason grading of local patterns" (2105.10420) introduces an innovative methodology to tackle the challenges in grading prostate cancer via histological images using deep learning techniques. The Gleason grading system remains the gold standard for prostate cancer diagnosis and prognosis. However, manual grading of prostate cancer through histology slides is highly subjective and time-consuming. Computer-aided diagnosis (CAD) systems aim to assist pathologists in this task, but they typically require labor-intensive pixel-level annotations. This paper proposes a weakly supervised self-learning framework using convolutional neural networks (CNNs) to overcome these limitations by leveraging global Gleason scores of whole slide images (WSIs).

In prostate cancer, Gleason grades (GG3, GG4, GG5) categorize cancer severity based on gland morphology observed in histology samples (Figure 1). The diagnostic process involves grading local glandular patterns, using the two most prominent grades to determine a global Gleason score, which is crucial but subject to high variability.

Figure 1: Histology regions of prostate biopsies. (a): region containing benign glands, (b): region containing GG3 glandular structures, (c): region containing GG4 patterns, (d): region containing GG5 patterns.

Methods

The proposed method comprises two main stages using a teacher-student training framework. The teacher model employs a multiple-instance learning (MIL) approach to infer patch-level Gleason grades from biopsy-level annotations by aggregating patch-level inferences (Figure 2). The teacher model uses global image labels to generate pseudo-labels for patch-level learning.

Figure 2: Self-learning CNNs pipeline for weakly supervised Gleason grading of local cancerous patterns in whole slide images. MIL: multiple-instance learning; WSI: whole slide image; NC: non-cancerous; GS: Gleason score; GG: Gleason grade.

During the prediction phase, the teacher model applies a max-pooling operation to combine the inferences of local instances into whole slide predictions, optimizing high-confidence predictions (Figure 3).

Figure 3: Teacher CNN for the prediction of local Gleason grades in a multiple-instance learning framework. GG: Gleason grade.

Subsequently, a label refinement process post-produces pseudo-labels from the teacher model, which are then used to train an equivalent student model on a pseudo-supervised dataset. This step eliminates false negative patches and enhances the learning capabilities of the student model.

Results

Experiments evaluated the methods across different datasets for patch-level grading and biopsy-level scoring. For patch-level Gleason grading, the student model using max-pooling achieved an inter-dataset average Cohen's quadratic kappa of 0.82, indicating substantial agreement, outperforming previous supervised approaches that suffered from generalization limitations to external datasets (Figure 4).

Figure 4: Histology regions with mixed Gleason grades from SICAP dataset. (a): region containing benign and GG3 glands, (b): region containing GG3 and GG4 glandular structures, (c) and (d): region containing GG4 and GG5 patterns.

Confusion matrices display the high accuracy of the student model for predicting patch-level Gleason grades on various datasets, showing adaptability to diverse cases (Figure 5).

Figure 5: Confusion Matrix of the patch-level Gleason grades prediction done by Student CNN on the different test cohorts. (a): SICAP; (b): ARVANITI; (c): GERTYCH.

For global biopsy scoring, the student model's features were averaged and used with k-nearest neighbor classifiers to predict biopsy-level Grade Groups (Figure 6). This methodology proved superior in external validation tests, providing robust clustering results over conventional Gleason grading percentage approaches.

Figure 6: Confusion Matrix of the biopsy-level Grade Group prediction done by Student CNN features and k-Nearest Neighbor classifier, on the two test cohorts. (a): PANDA; (b): SICAP.

Discussion

The implications highlight the potential of weakly supervised learning frameworks to improve the reliability and robustness of Gleason grading systems. The removal of annotator bias and leveraging vast datasets enables the development of more generalizable models. Furthermore, this strategy could lead to reduced computational constraints since biopsies can be processed completely without pixel-level annotations.

Future research may explore optimizing normalization techniques to enhance generalization further and refine the framework for even larger biopsy datasets, which might provide more extensive coverage of heterogeneous cancer patterns.

Conclusion

The paper presents a significant contribution to the field of histopathology with its innovative use of self-learning and CNNs for prostate cancer grading. The system's superior performance in patch-level grading and biopsy-level scoring, the capacity to generalize without pixel-level annotations, and the focus on leveraging global Gleason scores mark a substantial advance. This approach not only validates the promise of weakly supervised learning but also paves the way for further developments in computational pathology.

Markdown Report Issue