Comparison of Deep Learning Approaches for Multi-Label Chest X-Ray Classification

Published 6 Mar 2018 in cs.CV | (1803.02315v2)

Abstract: The increased availability of X-ray image archives (e.g. the ChestX-ray14 dataset from the NIH Clinical Center) has triggered a growing interest in deep learning techniques. To provide better insight into the different approaches, and their applications to chest X-ray classification, we investigate a powerful network architecture in detail: the ResNet-50. Building on prior work in this domain, we consider transfer learning with and without fine-tuning as well as the training of a dedicated X-ray network from scratch. To leverage the high spatial resolution of X-ray data, we also include an extended ResNet-50 architecture, and a network integrating non-image data (patient age, gender and acquisition type) in the classification process. In a concluding experiment, we also investigate multiple ResNet depths (i.e. ResNet-38 and ResNet-101). In a systematic evaluation, using 5-fold re-sampling and a multi-label loss function, we compare the performance of the different approaches for pathology classification by ROC statistics and analyze differences between the classifiers using rank correlation. Overall, we observe a considerable spread in the achieved performance and conclude that the X-ray-specific ResNet-38, integrating non-image data yields the best overall results. Furthermore, class activation maps are used to understand the classification process, and a detailed analysis of the impact of non-image features is provided.

Abstract PDF Upgrade to Chat

Citations (368)

View on Semantic Scholar

Summary

The paper shows that fine-tuning ResNet-50 via transfer learning boosts the average AUC from 0.730 to 0.819.
It compares standard and extended ResNet-50 models, with the high-resolution variant achieving a modest yet significant performance gain.
Integrating non-image features like patient demographics yields a slight AUC improvement, underscoring the value of multi-modal data in clinical diagnostics.

Evaluation of Deep Learning Approaches for Multi-Label Chest X-Ray Classification

The paper at hand presents a methodical examination of various deep learning strategies for multi-label classification of chest X-ray images using the expansive ChestX-ray14 dataset. The overarching aim is to derive insights into the efficacy of diverse network architectures, initialization strategies, and auxiliary feature integration in enhancing classification performance. The study primarily revolves around the utilization of the ResNet-50 network within the context of transfer learning, fine-tuning, and training from scratch approaches.

A focal point of this research is the investigation into the ResNet-50 architecture, alongside its extended variant, the ResNet-50-large, adapted to accommodate the unique spatial demands of the ChestX-ray14 dataset by modifying the input size. Moreover, the integration of non-image data such as patient age, gender, and image acquisition type into the network architecture is explored. This holistic approach aims to simulate the diagnostic process employed in clinical environments where additional patient information is taken into account.

Analysis and Results

The evaluation process employs a robust 5-fold resampling technique coupled with a multi-label loss function to ensure a comprehensive analysis of classification performance. The study explores three primary areas: weight initialization and transfer learning, network architectures' impact, and the incorporation of non-image features.

Weight Initialization and Transfer Learning: The analysis discerned that transfer-learning models incorporating fine-tuning exhibit significant performance enhancements over models trained from scratch or using off-the-shelf parameters. The ResNet-50 architecture fine-tuned on the ChestX-ray14 dataset delivered an average AUC of 0.819, a notable leap from the baseline performance of off-the-shelf networks (AUC 0.730).
Network Architecture Variations: The extended ResNet-50-large achieved a marginal improvement in average AUC over its standard counterpart, demonstrating the benefit of higher input resolution in effectively distinguishing intricate pathological features like masses or nodules.
Non-Image Features Integration: The incorporation of non-image data delivered a slight increase in average AUC, with the ResNet-50-large-meta variant attaining the apex average AUC of 0.822. This suggests that while the integration of non-image features yields benefits, the image features extracted by the network already encapsulate substantial information, as corroborated by the ability of image-based networks to predict these attributes with notable accuracy independently.

The rank correlation analysis of model outcomes suggests prevalent prediction consistency among models trained solely on X-ray data, unveiling avenues for further research into model robustness and consistency.

Implications and Future Directions

The findings of this research underscore the implications and potential applications of deep learning in automating the interpretation of large-scale medical image datasets, thus addressing the scarcity of expert radiological reviews amidst increasing patient volumes. However, the reliance on datasets with label noise, such as ChestX-ray14, reveals lingering challenges, particularly in reliably distinguishing clinically relevant pathological instances, as highlighted by the Grad-CAM analysis indicating the misidentification of pneumothorax in treated cases.

Practically, the endeavor for integration into clinical workflows necessitates further refinement of network architectures and evaluation methodologies. Future explorations could entail the development of novel architectures that leverage dependency on inter-related labels or incorporate segmentation techniques to enhance spatial feature extraction. As large-scale annotated medical datasets become increasingly available, the continuous advancement in model adaptability and interpretability will catalyze the transition of these findings from the research domain to clinical application.

Markdown Report Issue