Deep Learning-Based Noninvasive Screening of Type 2 Diabetes with Chest X-ray Images and Electronic Health Records

Published 14 Dec 2024 in cs.LG and cs.CV | (2412.10955v1)

Abstract: The imperative for early detection of type 2 diabetes mellitus (T2DM) is challenged by its asymptomatic onset and dependence on suboptimal clinical diagnostic tests, contributing to its widespread global prevalence. While research into noninvasive T2DM screening tools has advanced, conventional machine learning approaches remain limited to unimodal inputs due to extensive feature engineering requirements. In contrast, deep learning models can leverage multimodal data for a more holistic understanding of patients' health conditions. However, the potential of chest X-ray (CXR) imaging, one of the most commonly performed medical procedures, remains underexplored. This study evaluates the integration of CXR images with other noninvasive data sources, including electronic health records (EHRs) and electrocardiography signals, for T2DM detection. Utilising datasets meticulously compiled from the MIMIC-IV databases, we investigated two deep fusion paradigms: an early fusion-based multimodal transformer and a modular joint fusion ResNet-LSTM architecture. The end-to-end trained ResNet-LSTM model achieved an AUROC of 0.86, surpassing the CXR-only baseline by 2.3% with just 9863 training samples. These findings demonstrate the diagnostic value of CXRs within multimodal frameworks for identifying at-risk individuals early. Additionally, the dataset preprocessing pipeline has also been released to support further research in this domain.

Abstract PDF HTML Upgrade to Chat

Summary

The paper introduces a novel multimodal approach that combines chest X-rays, EHRs, and ECG data for noninvasive T2DM screening.
The joint fusion ResNet-LSTM model achieved a 2.3% AUROC improvement over the CXR-only baseline, demonstrating enhanced predictive performance.
The study underscores the potential of integrated deep learning models to facilitate early T2DM detection in resource-limited clinical settings.

Deep Learning-Based Noninvasive Screening of Type 2 Diabetes with Chest X-ray Images and Electronic Health Records

This paper presents a novel approach for noninvasive screening of Type 2 Diabetes Mellitus (T2DM) by leveraging deep learning techniques with Chest X-ray (CXR) images and Electronic Health Records (EHRs). It addresses the limitations of conventional unimodal ML approaches which require extensive feature engineering and are confined to single data modalities. The research explores two multimodal deep learning architectures: a multimodal transformer and a joint fusion ResNet-LSTM, demonstrating the potential for improved T2DM prediction through integrated clinical data.

Figure 1: ViLT model architecture with the $\mathbf{D_{\boldsymbol{E+C+G}$} dataset.

Figure 2: ResNet-LSTM model architecture with the $\mathbf{D_{\boldsymbol{E+C+G}$} dataset.

Introduction to Multimodal Approaches

The study begins with a discussion on T2DM's asymptomatic onset, which complicates early detection and often leads to late interventions. The T2DM prevalence continues to rise, stressing the need for reliable, early screening tools. Traditional approaches largely depend on blood tests, which pose practical limitations due to their invasive nature and potential for delayed detection. This paper proposes utilizing widely available CXR images, in conjunction with EHRs, to develop a more comprehensive, noninvasive diagnostic framework.

Methods and Dataset Preparation

The research utilizes datasets from the MIMIC-IV databases, integrating CXR images with EHR data, and including electrocardiography (ECG) signals. Two architectures were evaluated:

Multimodal Transformer (ViLT): Employs a vision transformer without convolutional or region supervision, leveraging an early fusion strategy to integrate EHR, CXR, and ECG data.
Joint Fusion ResNet-LSTM: Utilizes a modular structure combining residual networks for CXR encoding and LSTMs for EHR and ECG time-series data encoding.

Both architectures were trained to maximize AUROC performance, with the ResNet-LSTM model achieving an AUROC improvement of 2.3% over the CXR-only baseline.

Experimental Results

The ResNet-LSTM model exhibited superior performance with AUROCs of 0.8616 for the dataset incorporating EHR, CXR, and ECG data, and 0.8592 without ECG data. This highlights the benefit of integrating multiple modalities, although the addition of ECG provided minimal diagnostic improvement. The comprehensive use of multimodal data proved valuable for increasing the diagnostic accuracy of T2DM detection.

Ablation Studies

Ablation studies revealed several key insights:

Pre-training Benefits: Pre-training the ViT improved the performance significantly, demonstrating the efficacy of transfer learning.
Robustness to Noise: The ViLT model showed greater robustness against noisy inputs compared to the joint fusion models, likely due to its attention-based architecture's ability to generalize from global information.
Handling Missing Modalities: Joint fusion models better handled missing CXR data due to their modular encoders, highlighting the resilience of such architectures.

Conclusion

This study successfully demonstrates that integrating CXRs and EHRs can enhance the sensitivity and specificity of T2DM detection, even with a limited number of samples. This approach has potential implications for clinical practice, especially in resource-limited environments. Future work should involve validating these models on external datasets and exploring alternative fusion architectures to leverage multimodal data fully.

This research marks a significant step toward practical, noninvasive screening tools for T2DM, paving the way for more personalized and early interventions in diabetes care.

Markdown Report Issue