MedMobile: A mobile-sized language model with expert-level clinical capabilities

Published 11 Oct 2024 in cs.CL | (2410.09019v1)

Abstract: LLMs (LMs) have demonstrated expert-level reasoning and recall abilities in medicine. However, computational costs and privacy concerns are mounting barriers to wide-scale implementation. We introduce a parsimonious adaptation of phi-3-mini, MedMobile, a 3.8 billion parameter LM capable of running on a mobile device, for medical applications. We demonstrate that MedMobile scores 75.7% on the MedQA (USMLE), surpassing the passing mark for physicians (~60%), and approaching the scores of models 100 times its size. We subsequently perform a careful set of ablations, and demonstrate that chain of thought, ensembling, and fine-tuning lead to the greatest performance gains, while unexpectedly retrieval augmented generation fails to demonstrate significant improvements

Abstract PDF HTML Upgrade to Chat

Authors (5)

Summary

The paper introduces MedMobile, a compact 3.8B parameter model that achieves a 75.7% MedQA score, surpassing the physician passing threshold.
It employs fine-tuning with both manually curated and synthetic data, integrating chain-of-thought reasoning and response ensembling to boost accuracy.
MedMobile’s efficient design enables expert-level performance on mobile devices, highlighting its potential for clinical applications in resource-constrained settings.

MedMobile: A Mobile-Sized LLM for Clinical Applications

The paper introduces MedMobile, a compact LLM designed for medical applications, with a scale of 3.8 billion parameters. This model is built on an adaptation of phi-3-mini and represents an effort to mitigate the computational costs and privacy concerns associated with larger models. MedMobile is specifically tailored to operate on mobile devices, achieving expert-level clinical performance.

Key Achievements and Methodology

MedMobile achieves a 75.7% score on the MedQA (USMLE), surpassing the physician passing mark of approximately 60%. This performance is noteworthy, especially given the model's parameter size relative to others in the same space. It stands as the smallest model to achieve such a score on MedQA, substantially outperforming other models in its parameter range.

Several technical enhancements were applied to reach these results. The model was fine-tuned using both manually curated and synthetically generated data, demonstrating the utility of synthetic data from higher-order models like GPT-4. Critical methods contributing to performance included chain-of-thought (CoT) reasoning, response ensembling, and supervised fine-tuning (SFT). Notably, techniques such as retrieval augmented generation (RAG) did not yield significant improvements, contrary to prevailing assumptions.

Comparative Performance

MedMobile's performance was evaluated against other open-source models, such as UltraMedical Llama 8B, within the MultiMedQA framework, which encompasses a variety of medical tasks from the USMLE to PubMedQA. Remarkably, MedMobile's accuracy equaled or surpassed models with significantly more parameters across several evaluation tasks.

Despite its reduced size, MedMobile displayed effective medical reasoning, providing contextual and logical responses to complex medical scenarios. This performance can be attributed to its training with CoT from GPT-4, distilling advanced problem-solving techniques into a more compact model.

Implications and Future Directions

MedMobile's development points toward significant opportunities in resource-constrained environments due to its low computational demand. It democratizes access to AI-driven insights, potentially assisting healthcare providers and patients without extensive technological infrastructure.

The paper suggests possible expansions to vision-LLMs (VLMs) to integrate visual data with language processing, leveraging modalities like photoacoustic imaging. Moreover, exploring multi-agent systems could enhance problem-solving accuracy by distributing tasks among iterative MedMobile instances.

In conclusion, MedMobile signifies a meaningful shift towards efficient, mobile-compatible AI in the medical domain. The techniques demonstrated could extend beyond healthcare, providing a pathway for the adaptation of mobile-sized LLMs in various specialized fields. This work sets a foundation for more inclusive access to AI capabilities, fostering both clinical and broader industry advancements.

Markdown Report Issue