Corrective Retrieval Augmented Generation
Abstract: LLMs inevitably exhibit hallucinations since the accuracy of generated texts cannot be secured solely by the parametric knowledge they encapsulate. Although retrieval-augmented generation (RAG) is a practicable complement to LLMs, it relies heavily on the relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong. To this end, we propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation. Specifically, a lightweight retrieval evaluator is designed to assess the overall quality of retrieved documents for a query, returning a confidence degree based on which different knowledge retrieval actions can be triggered. Since retrieval from static and limited corpora can only return sub-optimal documents, large-scale web searches are utilized as an extension for augmenting the retrieval results. Besides, a decompose-then-recompose algorithm is designed for retrieved documents to selectively focus on key information and filter out irrelevant information in them. CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches. Experiments on four datasets covering short- and long-form generation tasks show that CRAG can significantly improve the performance of RAG-based approaches.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What is this paper about?
This paper is about making AI text generators (like chatbots) more reliable when they look up information. Today, many AIs use a method called retrieval-augmented generation (RAG): before answering, they “retrieve” documents from a database and then “generate” a response using those documents. The problem is that retrieval can go wrong—sometimes the AI grabs irrelevant or misleading pages—and then the AI’s answer can be incorrect. The authors propose a fix called Corrective Retrieval-Augmented Generation (CRAG) that double-checks the retrieved documents and corrects course when needed.
What questions did the researchers ask?
They focused on three simple questions:
- How can we tell if the documents an AI retrieves are actually helpful for the question?
- What should the AI do if the retrieved documents are unhelpful or uncertain?
- Can we use a smarter way to pick out only the most important parts of documents so the AI doesn’t get distracted?
How did they do it?
Think of the system like a careful student with a helpful librarian and a quality checker:
- A student (the AI writer) wants to answer a question.
- The librarian (the retriever) brings some articles.
- A quality checker (CRAG’s evaluator) scores how relevant each article is to the question.
Based on that score, CRAG takes one of three actions. Here are the key parts:
- A lightweight “retrieval evaluator” (the quality checker)
- This is a smaller AI that looks at the question and each retrieved document and gives a relevance score. It’s like asking, “Does this document actually help answer the question?”
- Three possible actions depending on confidence
- Correct: If at least one document looks clearly relevant, use it—but carefully.
- Incorrect: If all documents look irrelevant, throw them out and search the web instead.
- Ambiguous: If it’s not clear, combine both the refined retrieved docs and web results to be safe.
- Decompose-then-recompose (careful note-taking)
- Even a good article can be long and contain distractions. CRAG cuts documents into small chunks (like sentences or short paragraphs), keeps only the pieces that directly help answer the question, and then stitches those key pieces back together. This reduces noise and helps the AI focus.
- Web search as backup
- If the original database doesn’t have good info, CRAG rewrites the question into simple search keywords (like a human would) and uses a web search API to find pages. It then applies the same chunking-and-filtering step to these web pages.
- Plug-and-play design
- CRAG is like an add-on that can work with different RAG systems. The authors tested it with standard RAG and a stronger version called Self-RAG.
What did they find?
The authors tested CRAG on four types of tasks:
- PopQA: short factual questions
- Biography: long-form writing about a person, scored for factual correctness
- PubHealth: true/false health claims
- ARC-Challenge: multiple-choice science questions
Main results and why they matter:
- CRAG consistently improved accuracy and factual quality compared to standard RAG and even improved over Self-RAG in many cases. In plain terms: fewer mistakes and better use of the right information.
- The retrieval evaluator (the quality checker) was very good at judging document relevance—better than simply asking a general-purpose AI to judge relevance. This is important because good decisions early on lead to better answers.
- CRAG was more robust when retrieval got worse. Even if the retriever brought back fewer correct documents, CRAG’s performance dropped less than other methods, thanks to its backup plan (web search) and careful filtering.
- Every part mattered. When the authors removed any single component—like the three-way action choice, document refinement, or search-query rewriting—performance got worse. This shows the design choices were all useful.
Why it matters and what could happen next
CRAG makes AI answers more trustworthy by:
- Checking whether the retrieved information is actually helpful.
- Using web search when the initial documents aren’t good enough.
- Keeping only the most relevant parts of documents to avoid confusion.
- Working as a simple attachment to many existing RAG systems.
This could lead to better AI assistants for homework help, research, health information, and science learning—where getting facts right really matters.
What’s next:
- The evaluator still needs to be trained (fine-tuned), which takes some effort.
- Web sources can be unreliable or biased, so future work should keep improving how the system picks trustworthy information.
- Better ways to detect and correct wrong information could make the system even more reliable.
Collections
Sign up for free to add this paper to one or more collections.