DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation

Published 2 Jun 2025 in cs.CL, cs.AI, and cs.LG | (2506.01954v1)

Abstract: Retrieval-Augmented Generation (RAG) methods have proven highly effective for tasks requiring factual consistency and robust knowledge retrieval. However, large-scale RAG systems consume significant computational resources and are prone to generating hallucinated content from Humans. In this work, we introduce $\texttt{DRAG}$, a novel framework for distilling RAG knowledge from large-scale LLMs into small LMs (SLMs). Our approach leverages evidence- and knowledge graph-based distillation, ensuring that the distilled model retains critical factual knowledge while significantly reducing model size and computational cost. By aligning the smaller model's predictions with a structured knowledge graph and ranked evidence, $\texttt{DRAG}$ effectively mitigates hallucinations and improves factual accuracy. We further present a case demonstrating how our framework mitigates user privacy risks and introduce a corresponding benchmark. Experimental evaluations on multiple benchmarks demonstrate that our method outperforms the prior competitive RAG methods like MiniRAG for SLMs by up to 27.7% using the same models, preserving high-level efficiency and reliability. With $\texttt{DRAG}$, we provide a practical and resource-efficient roadmap to deploying enhanced retrieval and generation capabilities in small-sized LLMs.

Abstract PDF Upgrade to Chat

Authors (6)

Summary

The paper introduces DRAG, a framework that distills retrieval-augmented generation from LLMs to SLMs, reducing hallucinations and improving factual accuracy.
The framework employs a multi-stage process combining evidence generation, ranking, and graph-based knowledge representation to optimize computational efficiency.
Experimental results on benchmarks like ARC-C, MedMCQA, and MMLU show DRAG outperforms prior methods with improvements up to 27.7%.

DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation

Introduction

The research introduces DRAG, a framework designed to distill Retrieval-Augmented Generation (RAG) techniques from LLMs to smaller LLMs (SLMs). DRAG addresses the challenges of computational inefficiency and hallucination in existing RAG systems by leveraging evidence- and graph-based distillation methods. This is crucial for optimizing RAG frameworks for deployment in resource-constrained environments without compromising factual accuracy or retrieval capabilities.

Figure 1: Framework Overview of Our Evidence- and Graph-based RAG Distillation. Given a user query, the approach processes and distills the necessary information through structured evidence and graph-based mechanisms.

Methodology

DRAG operates through a multi-stage process encompassing evidence generation, evidence ranking, graph-based knowledge representation, and application in SLM evaluation. Initially, evidence relevant to a specific query is generated using a large-scale LLM (Figure 1). This evidence is ranked and filtered to ensure relevance and accuracy. Subsequently, structured knowledge in the form of graphs is extracted, capturing essential entity relationships without the redundancy of raw text.

The final stage employs SLMs to generate accurate and contextually grounded responses using the distilled knowledge. This ensures mitigation of hallucination and enhances efficiency, aligning the smaller model's predictions with factual evidence and ranked knowledge graphs.

Figure 2: Effect of retrieved graph-based and evidence-based RAG on multiple-choice QA tasks, demonstrating superior accuracy and consistency.

Experimental Results

DRAG's efficacy is illustrated through evaluations on multiple benchmarks such as ARC-C, MedMCQA, and MMLU. The results reveal that DRAG outperforms existing methods like MiniRAG and SimRAG by substantial margins, particularly in factual accuracy and computational efficiency. For instance, DRAG achieved a 93.0% score on ARC-C, representing a 27.7% improvement over prior methods.

In addition, DRAG showcases competitive performance across diverse LLMs and SLMs, reinforcing its robustness and adaptability in various retrieval-augmented scenarios. The framework's ability to enhance factual accuracy while preserving computational resources is a significant advantage for real-world applications.

Figure 3: More results on Retrieval Strategies, illustrating the comparative effectiveness of different approaches in harnessing retrieved knowledge.

Implications and Future Directions

The DRAG framework sets a new precedent for the deployment of retrieval-augmented generation techniques in small-scale LLMs. By marrying evidence-based and graph-based distillation, DRAG successfully bridges the resource-efficiency gap between large and small LLMs. Furthermore, its introduction of a privacy-preserving mechanism highlights additional practical utilities.

Future research could explore further optimizations in the evidence and ranking processes, potentially incorporating more sophisticated privacy mechanisms and extending the framework's applicability to broader LLM architectures and domains.

Conclusion

DRAG represents a significant advancement in the distillation of retrieval-augmented generation capabilities, offering a scalable solution that balances accuracy, efficiency, and factual consistency. Its success in enhancing small LLM performance underscores its potential for widespread adoption in environments demanding computational and resource efficiency.

Markdown Report Issue