ECoRAG: Evidentiality-guided Compression for Long Context RAG

Published 5 Jun 2025 in cs.CL, cs.AI, and cs.IR | (2506.05167v2)

Abstract: LLMs have shown remarkable performance in Open-Domain Question Answering (ODQA) by leveraging external documents through Retrieval-Augmented Generation (RAG). To reduce RAG overhead, from longer context, context compression is necessary. However, prior compression methods do not focus on filtering out non-evidential information, which limit the performance in LLM-based RAG. We thus propose Evidentiality-guided RAG, or ECoRAG framework. ECoRAG improves LLM performance by compressing retrieved documents based on evidentiality, ensuring whether answer generation is supported by the correct evidence. As an additional step, ECoRAG reflects whether the compressed content provides sufficient evidence, and if not, retrieves more until sufficient. Experiments show that ECoRAG improves LLM performance on ODQA tasks, outperforming existing compression methods. Furthermore, ECoRAG is highly cost-efficient, as it not only reduces latency but also minimizes token usage by retaining only the necessary information to generate the correct answer. Code is available at https://github.com/ldilab/ECoRAG.

Abstract PDF Upgrade to Chat

Summary

The paper introduces ECoRAG, a framework that enhances long-context RAG by filtering non-evidential information to improve ODQA accuracy.
It employs a dual encoder and an evidentiality evaluator to rank and compress sentences, ensuring essential evidence is retained while reducing latency.
Experimental results on datasets like NQ and TQA demonstrate that ECoRAG outperforms existing methods with higher EM and F1 scores using fewer tokens.

ECoRAG: Evidentiality-Guided Compression for Long Context RAG

The paper presents ECoRAG, a framework designed to enhance Retrieval-Augmented Generation (RAG) systems in handling long contexts by introducing a mechanism for evidentiality-guided compression. The primary objective of ECoRAG is to improve Open-Domain Question Answering (ODQA) by efficiently compressing retrieved documents, ensuring that the LLMs generate answers based on necessary and correct evidence. This approach mitigates the issues of latency and computational cost associated with processing lengthy contexts, while also improving the accuracy of the generated answers.

Evidentiality-Guided Compression

The traditional RAG framework employs external document retrieval to aid LLMs in ODQA tasks. However, the increased context length from retrieved documents often introduces non-evidential information, detracting from answer quality and increasing computational costs. Unlike prior methods, ECoRAG specifically filters out non-evidential content by prioritizing sentences that directly support the generation of the correct answer. The framework defines evidentiality hierarchically, distinguishing between strong and weak evidence and identifying distractors that could mislead the LLM.

To achieve this, ECoRAG utilizes a dual encoder approach, distinguishing significant contributions of individual sentences to the final answer. The model's compressor is trained with evidentiality labels obtained from prior LLM-generated evaluations, optimizing it to rank sentences based on their evidentiality effectively.

Adaptive Compression and Efficient Retrieval

ECoRAG introduces a unique feature called evidentiality reflection, employing an evidentiality evaluator to assess whether the current compression sufficiently supports the answer generation. Using a lightweight model for fast evaluation, the framework adaptively adjusts the compression by iteratively adding more evidence until the compressed content meets the evidentiality threshold necessary for accurate answer generation. This process ensures that not only is the needed evidence retained, but also minimizes latency by avoiding excessive iterations.

Experimental Validation and Implications

The framework's efficacy was demonstrated across multiple datasets, including NQ, TQA, and WQ. ECoRAG consistently outperformed other compression methods, such as RECOMP and LLMLingua, in both EM and F1 scores, while reducing the token count significantly. Its ability to reduce both the token usage and latency demonstrates its practical benefits in enhancing RAG systems' efficiency, especially in scenarios involving extensive retrievable sources.

Practical and Theoretical Implications

Practically, ECoRAG offers a robust solution for improving the efficiency and accuracy of LLMs employed in domains requiring long-context understanding, such as ODQA. Theoretically, this framework advances understanding of how evidentiality can guide the compression of informational context, highlighting the importance of dynamic evidence assessment. It encourages future work in AI models to incorporate evidentiality as a core component for optimizing task-specific performances.

In conclusion, ECoRAG exemplifies a significant step in addressing the challenges faced by RAG systems in managing long contexts, contributing to more efficient and accurate LLM performances. This research opens avenues for applying evidentiality-guided processes to broader AI applications beyond ODQA, suggesting a potentially transformative approach to context handling in LLMs.