Can ChatGPT Diagnose Alzheimer's Disease?

Published 10 Feb 2025 in cs.LG and cs.AI | (2502.06907v1)

Abstract: Can ChatGPT diagnose Alzheimer's Disease (AD)? AD is a devastating neurodegenerative condition that affects approximately 1 in 9 individuals aged 65 and older, profoundly impairing memory and cognitive function. This paper utilises 9300 electronic health records (EHRs) with data from Magnetic Resonance Imaging (MRI) and cognitive tests to address an intriguing question: As a general-purpose task solver, can ChatGPT accurately detect AD using EHRs? We present an in-depth evaluation of ChatGPT using a black-box approach with zero-shot and multi-shot methods. This study unlocks ChatGPT's capability to analyse MRI and cognitive test results, as well as its potential as a diagnostic tool for AD. By automating aspects of the diagnostic process, this research opens a transformative approach for the healthcare system, particularly in addressing disparities in resource-limited regions where AD specialists are scarce. Hence, it offers a foundation for a promising method for early detection, supporting individuals with timely interventions, which is paramount for Quality of Life (QoL).

Abstract PDF Upgrade to Chat

Summary

The paper demonstrates that ChatGPT, when using multi-shot prompting with combined MRI and cognitive test data, achieves 94.6% diagnostic accuracy.
The study employs rigorous evaluation of zero-shot versus multi-shot methods on 9300 EHRs, highlighting improved precision and calibration with added context.
The findings imply that integrating AI diagnostics like ChatGPT can democratize early Alzheimer’s detection and potentially extend to other health conditions.

Can ChatGPT Diagnose Alzheimer’s Disease?

The paper "Can ChatGPT Diagnose Alzheimer’s Disease?" explores the potential of ChatGPT, a LLM, as a diagnostic tool for Alzheimer’s Disease (AD) by analyzing its performance on electronic health records (EHRs). This study is particularly significant given the prevalence of AD among older adults and the crucial role early detection plays in managing the disease's progression.

Methodology

The researchers utilized a dataset comprising 9300 EHRs from the Alzheimer's Disease Neuroimaging Initiative (ADNI). The records included magnetic resonance imaging (MRI) data and cognitive test scores, with patients classified as either normal controls (NC), mild cognitive impairment (MCI), or AD. The evaluation of ChatGPT's diagnostic capabilities was conducted using two approaches: zero-shot and multi-shot prompting.

Zero-Shot Prompting: ChatGPT generates a diagnosis without any prior training instances or examples. This method relies on the model's inherent capability to generalize from its pre-trained knowledge.
Multi-Shot Prompting: ChatGPT is provided with examples of question-answer pairs, including ground truth labels, to guide its predictions.

Both approaches were tested under three data conditions: using only MRI data, only cognitive test scores, and a combination of both, to evaluate which configuration yields the most accurate diagnostics.

Results

The findings indicate a notable superiority of the multi-shot method over the zero-shot approach in terms of accuracy and calibration:

Multi-Shot Performance: The highest accuracy was achieved when combining MRI with cognitive tests, reaching 94.6% accuracy at a 75% confidence threshold. This approach demonstrated significant improvements in recall, precision, and F1-score as compared to the zero-shot method, showcasing its effectiveness in leveraging the additional context provided by examples.
Zero-Shot Performance: Although less effective than the multi-shot approach, zero-shot prompting still provided valuable insights, particularly when MRI and cognitive test data were combined, achieving an accuracy of 74.4% at a similar threshold.

The calibration metrics, including Expected Calibration Error (ECE) and Maximum Calibration Error (MCE), highlighted the enhanced confidence levels in predictions when multi-shot prompting was employed, particularly with combined data, suggesting a more reliable model output.

Implications and Future Directions

This research signifies the potential for LLMs such as ChatGPT in medical diagnostics, specifically in settings where specialist availability is limited. This would allow for democratizing access to diagnostic tools in resource-constrained environments. The study suggests that incorporating cognitive test results and MRI data could enhance diagnostic precision, which is vital for timely interventions that improve the quality of life for AD patients.

The promising results of ChatGPT in this domain open avenues for further exploration, including:

Extending the model’s application to a broader range of diseases using similar multimodal approaches.
Integrating fairness assessments to ensure equitable diagnostic performance across diverse demographic groups.
Comparing ChatGPT's performance with other emerging technologies, such as LLaMA and Google Gemini, to determine its relative efficacy and operational advantages.

In summary, while the application of ChatGPT to AD diagnosis shows potential, further research and development are required to refine its capabilities, ensure fairness, and broaden its applicability.

Markdown Report Issue