Rationale-Guided Retrieval Augmented Generation for Medical Question Answering
Abstract: LLMs (LLM) hold significant potential for applications in biomedicine, but they struggle with hallucinations and outdated knowledge. While retrieval-augmented generation (RAG) is generally employed to address these issues, it also has its own set of challenges: (1) LLMs are vulnerable to irrelevant or incorrect context, (2) medical queries are often not well-targeted for helpful information, and (3) retrievers are prone to bias toward the specific source corpus they were trained on. In this study, we present RAG$2$ (RAtionale-Guided RAG), a new framework for enhancing the reliability of RAG in biomedical contexts. RAG$2$ incorporates three key innovations: a small filtering model trained on perplexity-based labels of rationales, which selectively augments informative snippets of documents while filtering out distractors; LLM-generated rationales as queries to improve the utility of retrieved snippets; a structure designed to retrieve snippets evenly from a comprehensive set of four biomedical corpora, effectively mitigating retriever bias. Our experiments demonstrate that RAG$2$ improves the state-of-the-art LLMs of varying sizes, with improvements of up to 6.1\%, and it outperforms the previous best medical RAG model by up to 5.6\% across three medical question-answering benchmarks. Our code is available at https://github.com/dmis-lab/RAG2.
- AI@Meta. 2024. Llama 3 model card.
- Self-rag: Learning to retrieve, generate, and critique through self-reflection. In The Twelfth International Conference on Learning Representations.
- Evaluating entity disambiguation and the role of popularity in retrieval-based nlp. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4472–4485.
- Meditron-70b: Scaling medical pretraining for large language models. arXiv preprint arXiv:2311.16079.
- Scaling instruction-finetuned language models. Journal of Machine Learning Research, 25(70):1–53.
- Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 758–759.
- Neural retrievers are biased towards llm-generated content. In ICLR 2024 Workshop: How Far Are We From AGI.
- Similarity is not all you need: Endowing retrieval augmented generation with multi layered thoughts. arXiv preprint arXiv:2405.19893.
- Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997.
- Measuring massive multitask language understanding. In International Conference on Learning Representations.
- A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232.
- Query expansion by prompting large language models. arXiv preprint arXiv:2305.03653.
- Improving medical reasoning through retrieval and self-reflection with retrieval-augmented large language models. Bioinformatics.
- Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 7029–7043.
- Mistral 7b. arXiv preprint arXiv:2310.06825.
- Active retrieval augmented generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 7969–7992.
- What disease does this patient have? a large-scale open domain question answering dataset from medical exams. Applied Sciences, 11(14):6421.
- Medcpt: Contrastive pre-trained transformers with large-scale pubmed search logs for zero-shot biomedical information retrieval. Bioinformatics, 39(11):btad651.
- Knowledge-augmented reasoning distillation for small language models in knowledge-intensive tasks. Advances in Neural Information Processing Systems, 36.
- Realtime qa: what’s the answer right now? Advances in Neural Information Processing Systems, 36.
- Small language models learn enhanced reasoning skills from medical textbooks. arXiv preprint arXiv:2404.00376.
- Automatic creation of named entity recognition datasets by querying phrase representations. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7148–7163.
- Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles, pages 611–626.
- BioMistral: A collection of open-source pretrained large language models for medical domains. In Findings of the Association for Computational Linguistics ACL 2024, pages 5848–5864, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
- Bailicai: A domain-optimized retrieval-augmented generation framework for medical applications. arXiv preprint arXiv:2407.21055.
- On faithfulness and factuality in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1906–1919.
- OpenAI. 2022. Introducing chatgpt.
- OpenAI. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 1.
- OpenAI. 2024. Hello gpt-4o.
- Medmcqa: A large-scale multi-subject multi-choice dataset for medical domain question answering. In Conference on health, inference, and learning, pages 248–260. PMLR.
- Lin CY Rouge. 2004. A package for automatic evaluation of summaries. In Proceedings of Workshop on Text Summarization of ACL, Spain, volume 5.
- Capabilities of gemini models in medicine. arXiv preprint arXiv:2404.18416.
- Ares: An automated evaluation framework for retrieval-augmented generation systems. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 338–354.
- Large language models encode clinical knowledge. Nature, 620(7972):172–180.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Query2doc: Query expansion with large language models. In The 2023 Conference on Empirical Methods in Natural Language Processing.
- Learning to filter context for retrieval-augmented generation. arXiv preprint arXiv:2311.08377.
- Rat: Retrieval augmented thoughts elicit context-aware reasoning in long-horizon generation. arXiv preprint arXiv:2403.05313.
- Speculative rag: Enhancing retrieval augmented generation through drafting. arXiv preprint arXiv:2407.08223.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837.
- Instructrag: Instructing retrieval-augmented generation with explicit denoising. arXiv preprint arXiv:2406.13629.
- Clasheval: Quantifying the tug-of-war between an llm’s internal prior and external evidence. arXiv preprint arXiv:2404.10198.
- Benchmarking retrieval-augmented generation for medicine. In Findings of the Association for Computational Linguistics ACL 2024, pages 6233–6251, Bangkok, Thailand and virtual meeting. Association for Computational Linguistics.
- Improving retrieval-augmented generation in medicine with iterative follow-up questions. arXiv preprint arXiv:2408.00727.
- Corrective retrieval augmented generation. arXiv preprint arXiv:2401.15884.
- Seakr: Self-aware knowledge retrieval for adaptive retrieval augmented generation. arXiv preprint arXiv:2406.19215.
- Almanac—retrieval-augmented language models for clinical medicine. NEJM AI, 1(2):AIoa2300068.
- Raft: Adapting language model to domain specific rag. arXiv preprint arXiv:2403.10131.
- Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.
- How do large language models capture the ever-changing world knowledge? a review of recent advances. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8289–8311.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.