Radiology-GPT: A Large Language Model for Radiology
Abstract: We introduce Radiology-GPT, a LLM for radiology. Using an instruction tuning approach on an extensive dataset of radiology domain knowledge, Radiology-GPT demonstrates superior performance compared to general LLMs such as StableLM, Dolly and LLaMA. It exhibits significant versatility in radiological diagnosis, research, and communication. This work serves as a catalyst for future developments in clinical NLP. The successful implementation of Radiology-GPT is indicative of the potential of localizing generative LLMs, specifically tailored for distinctive medical specialties, while ensuring adherence to privacy standards such as HIPAA. The prospect of developing individualized, large-scale LLMs that cater to specific needs of various hospitals presents a promising direction. The fusion of conversational competence and domain-specific knowledge in these models is set to foster future development in healthcare AI. A demo of Radiology-GPT is available at https://huggingface.co/spaces/allen-eric/radiology-gpt.
- Summary of chatgpt/gpt-4 research and perspective towards the future of large language models. arXiv preprint arXiv:2304.01852, 2023.
- R OpenAI. Gpt-4 technical report. arXiv, 2023.
- When brain-inspired ai meets agi. arXiv preprint arXiv:2303.15935, 2023.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419, 2023.
- Chataug: Leveraging chatgpt for text data augmentation. arXiv preprint arXiv:2302.13007, 2023.
- Impressiongpt: An iterative optimizing framework for radiology report summarization with chatgpt. arXiv preprint arXiv:2304.08448, 2023.
- Context matters: A strategy to pre-train language model for science education. arXiv preprint arXiv:2301.12031, 2023.
- Exploring the trade-offs: Unified large language models vs local fine-tuned models for highly-specific radiology nli task. arXiv preprint arXiv:2304.09138, 2023.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
- Deid-gpt: Zero-shot medical text de-identification by gpt-4. arXiv preprint arXiv:2303.11032, 2023.
- Radbert: Adapting transformer-based language models to radiology. Radiology: Artificial Intelligence, 4(4):e210258, 2022.
- Clinicalradiobert: Knowledge-infused few shot learning for clinical notes named entity recognition. In Machine Learning in Medical Imaging: 13th International Workshop, MLMI 2022, Held in Conjunction with MICCAI 2022, Singapore, September 18, 2022, Proceedings, pages 269–278. Springer, 2022.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022.
- Chatabl: Abductive learning via natural language interaction with chatgpt. arXiv preprint arXiv:2304.11107, 2023.
- Anel Islamovic. Stability AI Launches the First of its StableLM Suite of Language Models — Stability AI — stability.ai. https://stability.ai/blog/stability-ai-launches-the-first-of-its-stablelm-suite-of-language-models. [Accessed 09-Jun-2023].
- Free Dolly: Introducing the World’s First Truly Open Instruction-Tuned LLM — databricks.com. https://www.databricks.com/blog/2023/04/12/dolly-first-open-commercially-viable-instruction-tuned-llm. [Accessed 09-Jun-2023].
- Mimic-cxr, a de-identified publicly available database of chest radiographs with free-text reports. Scientific data, 6(1):317, 2019.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Improving language understanding by generative pre-training. 2018.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Mask-guided bert for few shot text classification. arXiv preprint arXiv:2302.10447, 2023.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Bloom: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100, 2022.
- Stanford CRFM — crfm.stanford.edu. https://crfm.stanford.edu/2023/03/13/alpaca.html. [Accessed 09-Jun-2023].
- Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH), 3(1):1–23, 2021.
- Survey on natural language processing in medical image analysis. Zhong nan da xue xue bao. Yi xue ban= Journal of Central South University. Medical Sciences, 47(8):981–993, 2022.
- Agribert: knowledge-infused agricultural language models for matching food and nutrition. IJCAI, 2022.
- Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association, 23(2):304–310, 2016.
- Word graph guided summarization for radiology findings. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4980–4990, 2021.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155, 2022.
- Interpretability at scale: Identifying causal mechanisms in alpaca. arXiv preprint arXiv:2305.08809, 2023.
- A Wallis and P McCoubrie. The radiology report—are we getting the message across? Clinical radiology, 66(11):1015–1022, 2011.
- Automatic structuring of radiology free-text reports. Radiographics, 21(1):237–245, 2001.
- Variability in radiologists’ interpretations of mammograms. New England Journal of Medicine, 331(22):1493–1499, 1994.
- The “laboratory” effect: comparing radiologists’ performance and variability during prospective clinical and laboratory mammography interpretations. Radiology, 249(1):47–53, 2008.
- Prostate magnetic resonance imaging interpretation varies substantially across radiologists. European urology focus, 5(4):592–599, 2019.
- KM Alhendawi and Ahmad Suhaimi Baharudin. String matching algorithms (smas): survey & empirical analysis. Journal of Computer Sciences and Management, 2013.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
- Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81, 2004.
- Malik Sallam. The utility of chatgpt as an example of large language models in healthcare education, research and practice: Systematic review on the future perspectives and potential limitations. medRxiv, pages 2023–02, 2023.
- Key challenges for delivering clinical impact with artificial intelligence. BMC medicine, 17:1–9, 2019.
- Differentiate chatgpt-generated and human-written medical texts. arXiv preprint arXiv:2304.11567, 2023.
- Ct and mri protocol variation and optimization at an academic medical center. Journal of the American College of Radiology, 15(9):1254–1258, 2018.
- Patient-centered imaging: shared decision making for cardiac imaging procedures with exposure to ionizing radiation. Journal of the American College of Cardiology, 63(15):1480–1489, 2014.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.