Papers
Topics
Authors
Recent
Search
2000 character limit reached

Corrective Retrieval Augmented Generation

Published 29 Jan 2024 in cs.CL | (2401.15884v3)

Abstract: LLMs inevitably exhibit hallucinations since the accuracy of generated texts cannot be secured solely by the parametric knowledge they encapsulate. Although retrieval-augmented generation (RAG) is a practicable complement to LLMs, it relies heavily on the relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong. To this end, we propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation. Specifically, a lightweight retrieval evaluator is designed to assess the overall quality of retrieved documents for a query, returning a confidence degree based on which different knowledge retrieval actions can be triggered. Since retrieval from static and limited corpora can only return sub-optimal documents, large-scale web searches are utilized as an extension for augmenting the retrieval results. Besides, a decompose-then-recompose algorithm is designed for retrieved documents to selectively focus on key information and filter out irrelevant information in them. CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches. Experiments on four datasets covering short- and long-form generation tasks show that CRAG can significantly improve the performance of RAG-based approaches.

Citations (43)

Summary

  • The paper introduces CRAG, which deploys a lightweight retrieval evaluator to classify document relevance and adjust retrieval actions accordingly.
  • It employs a decomposition-recomposition algorithm combined with dynamic web search integration to selectively refine and augment corpus information.
  • Experiments across four datasets demonstrate CRAG’s superior performance and robustness compared to traditional RAG and self-correcting methods.

Introduction

LLMs have gained significant traction for their impressive language generation abilities. However, they often grapple with factual errors and hallucinations, highlighting inherent limitations in their parametric knowledge. While Retrieval-Augmented Generation (RAG) systems offer a practical supplement to LLMs by augmenting generation with external documents, their efficacy is critically dependent on the relevance and accuracy of these documents. Current approaches, however, may indiscriminately integrate irrelevant information, undermining the robustness of generation.

Corrective Strategies for Robust Generation

The paper introduces an innovative Corrective Retrieval Augmented Generation (CRAG) methodology designed to enhance the resilience of RAG systems. CRAG incorporates a lightweight retrieval evaluator alongside a varying approach to document use depending on the evaluator's output. The evaluator assigns actions based on the estimated relevance, falling under Correct, Incorrect, or Ambiguous categories. This approach pivots on a decomposition-recomposition algorithm which addresses static corpuses' limitations by selectively focusing on essential information and discarding irrelevant content.

Web Search Integration and Experimentation

A novel aspect of CRAG is its integration with large-scale web searches for cases when knowledge retrieval is deemed Incorrect. Leveraging the dynamic nature of the web, CRAG broadens the scope and variety of information at the model’s disposal. This ensures a rich set of external knowledge to amend the initial corpus results. CRAG is demonstrated to be a plug-and-play model and shows compatibility with existing RAG-based approaches. Validation across four datasets emphasizes its performance uplift and generalizability over tasks that demand both short and long-form generation.

Conclusion and Contribution

CRAG's design for self-correcting and enhancing the utilization of retrieved documents marks a notable step in addressing RAG's existing pitfalls. The retrieval evaluator is central, helping to avoid the inclusion of misleading information and prompting robust actions based on document assessment. The refined approach shows significant improvements over the standard RAG and Self-RAG, indicating wide applicability in scenarios where the integrity of retrieved information is questionable. The experiments encapsulate CRAG's adaptability to RAG-based methods and its capability to transcend challenges across various generation categories.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

What is this paper about?

This paper is about making AI text generators (like chatbots) more reliable when they look up information. Today, many AIs use a method called retrieval-augmented generation (RAG): before answering, they “retrieve” documents from a database and then “generate” a response using those documents. The problem is that retrieval can go wrong—sometimes the AI grabs irrelevant or misleading pages—and then the AI’s answer can be incorrect. The authors propose a fix called Corrective Retrieval-Augmented Generation (CRAG) that double-checks the retrieved documents and corrects course when needed.

What questions did the researchers ask?

They focused on three simple questions:

  • How can we tell if the documents an AI retrieves are actually helpful for the question?
  • What should the AI do if the retrieved documents are unhelpful or uncertain?
  • Can we use a smarter way to pick out only the most important parts of documents so the AI doesn’t get distracted?

How did they do it?

Think of the system like a careful student with a helpful librarian and a quality checker:

  • A student (the AI writer) wants to answer a question.
  • The librarian (the retriever) brings some articles.
  • A quality checker (CRAG’s evaluator) scores how relevant each article is to the question.

Based on that score, CRAG takes one of three actions. Here are the key parts:

  • A lightweight “retrieval evaluator” (the quality checker)
    • This is a smaller AI that looks at the question and each retrieved document and gives a relevance score. It’s like asking, “Does this document actually help answer the question?”
  • Three possible actions depending on confidence
    • Correct: If at least one document looks clearly relevant, use it—but carefully.
    • Incorrect: If all documents look irrelevant, throw them out and search the web instead.
    • Ambiguous: If it’s not clear, combine both the refined retrieved docs and web results to be safe.
  • Decompose-then-recompose (careful note-taking)
    • Even a good article can be long and contain distractions. CRAG cuts documents into small chunks (like sentences or short paragraphs), keeps only the pieces that directly help answer the question, and then stitches those key pieces back together. This reduces noise and helps the AI focus.
  • Web search as backup
    • If the original database doesn’t have good info, CRAG rewrites the question into simple search keywords (like a human would) and uses a web search API to find pages. It then applies the same chunking-and-filtering step to these web pages.
  • Plug-and-play design
    • CRAG is like an add-on that can work with different RAG systems. The authors tested it with standard RAG and a stronger version called Self-RAG.

What did they find?

The authors tested CRAG on four types of tasks:

  • PopQA: short factual questions
  • Biography: long-form writing about a person, scored for factual correctness
  • PubHealth: true/false health claims
  • ARC-Challenge: multiple-choice science questions

Main results and why they matter:

  • CRAG consistently improved accuracy and factual quality compared to standard RAG and even improved over Self-RAG in many cases. In plain terms: fewer mistakes and better use of the right information.
  • The retrieval evaluator (the quality checker) was very good at judging document relevance—better than simply asking a general-purpose AI to judge relevance. This is important because good decisions early on lead to better answers.
  • CRAG was more robust when retrieval got worse. Even if the retriever brought back fewer correct documents, CRAG’s performance dropped less than other methods, thanks to its backup plan (web search) and careful filtering.
  • Every part mattered. When the authors removed any single component—like the three-way action choice, document refinement, or search-query rewriting—performance got worse. This shows the design choices were all useful.

Why it matters and what could happen next

CRAG makes AI answers more trustworthy by:

  • Checking whether the retrieved information is actually helpful.
  • Using web search when the initial documents aren’t good enough.
  • Keeping only the most relevant parts of documents to avoid confusion.
  • Working as a simple attachment to many existing RAG systems.

This could lead to better AI assistants for homework help, research, health information, and science learning—where getting facts right really matters.

What’s next:

  • The evaluator still needs to be trained (fine-tuned), which takes some effort.
  • Web sources can be unreliable or biased, so future work should keep improving how the system picks trustworthy information.
  • Better ways to detect and correct wrong information could make the system even more reliable.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 43 tweets with 600 likes about this paper.