Papers
Topics
Authors
Recent
Search
2000 character limit reached

In-Context Retrieval-Augmented Generation (RAG)

Updated 12 February 2026
  • In-Context RAG is a framework that combines large language models with external retrievers to dynamically incorporate evidence into responses.
  • It employs reinforcement learning and curriculum learning to optimize answer generation, citation accuracy, and robustness against distractors.
  • Empirical evaluations show significant joint F1 improvements on benchmarks like HotpotQA and MuSiQue under realistic retrieval conditions.

In-Context Retrieval-Augmented Generation (RAG) is an advanced paradigm in which LLMs are augmented with external retrieval modules, enabling the model to condition its outputs on dynamically retrieved evidence at inference time. In this architecture, both the retrieval and the generative processes take place “in context,” i.e., without parameter updates to the LLM, but rather by concatenating the retrieved passages to the prompt presented to the model. Recent research has focused on significantly enhancing retrieval-augmented generation by shifting some responsibilities traditionally handled by retrievers to the generation model, applying reinforcement learning (RL) and curriculum learning, and leveraging diverse context composition and training objectives to improve faithfulness, scalability, and citation accuracy (Huang et al., 17 Mar 2025, Gupta et al., 2024).

1. Core Architecture and Problem Formulation

The fundamental structure of in-context RAG comprises two principal modules: a retriever and a generator. Upon receiving a user query qq, the retriever computes a similarity score between the query and each document dd in the corpus D\mathcal{D}, returning the top-kk relevant documents {d1,,dk}\{d_1, \ldots, d_k\}. The generator, typically a pretrained LLM, then produces an output (answer aa) by conditioning on the concatenated input [q;d1;;dk][q; d_1; \ldots; d_k].

Mathematically, the retrieval distribution is defined as: Pret(dq)=softmaxd(sim(Encq(q),Encd(d)))P_{\text{ret}}(d \mid q) = \text{softmax}_d(\text{sim}(\text{Enc}_q(q), \text{Enc}_d(d))) and the generator produces a sequence o1:To_{1:T} according to: Pgen(o1:Tq,{di})=t=1TPgen(otq,{di},o<t)P_{\text{gen}}(o_{1:T} \mid q, \{d_i\}) = \prod_{t=1}^T P_{\text{gen}}(o_t \mid q, \{d_i\}, o_{<t}) The marginal probability of producing a final answer aa is then: Pgen(aq,{di})=o1:TaPgen(o1:Tq,{di})P_{\text{gen}}(a \mid q, \{d_i\}) = \sum_{o_{1:T} \rightarrow a} P_{\text{gen}}(o_{1:T} \mid q, \{d_i\}) RAG-RL extends this setup with the answer generator tasked not only with producing the answer but also with identifying and citing the passages truly relevant to the answer (Huang et al., 17 Mar 2025).

2. Reinforcement Learning Formulation and Rule-Based Rewards

RAG-RL formulates answer generation as a Markov Decision Process, where the state at step tt is st=(q,{di},o<t)s_t = (q, \{d_i\}, o_{<t}) and the action is the next output token oto_t. The episode concludes when a special token indicating the end of the answer is emitted. Rewards are issued only at trajectory completion and are composed of three components: R(τ)=Ranswer(τ)+Rcitations(τ)+Rformatting(τ)R(\tau) = R_{\text{answer}}(\tau) + R_{\text{citations}}(\tau) + R_{\text{formatting}}(\tau) where

  • $R_{\text{answer}} = \gamma_{\text{ans}} \cdot 𝟙(\text{gen}_{\text{answer}} = \text{gold answer})$,
  • Rcitations=γcorrRecall(cited,gold citations)γinc(#incorrect citations)R_{\text{citations}} = \gamma_{\text{corr}}\cdot \text{Recall}(\text{cited}, \text{gold citations}) - \gamma_{\text{inc}} \cdot (\# \text{incorrect citations}),
  • RformattingR_{\text{formatting}} is a formatting correctness bonus/penalty.

GRPO (Group Relative Policy Optimization) provides the policy-gradient update backbone. This RL framework encourages the model to optimize for both answer factuality and passage-level citation faithfulness under realistic retrieval scenarios with distractors (Huang et al., 17 Mar 2025).

3. Curriculum Learning and Example Difficulty Scheduling

To enhance learning stability and sample efficiency, RAG-RL applies explicit curriculum learning to the training data. Samples are assigned levels of difficulty based on the number of gold (supporting) passages and distractor (irrelevant) passages included in the retrieval set for each query. Difficulty scheduling functions, such as Max, Linear, and Min-Max, determine the progression of difficulty within each epoch.

  • The Min-Max curriculum, which interleaves easy (minimal distractor) and hard (maximal distractor) examples, yields the highest ultimate performance and the greatest resistance to noise among distractors.
  • Shuffling examples within difficulty levels has negligible negative effect on convergence or ultimate performance (Huang et al., 17 Mar 2025).

4. Empirical Evaluation and Benchmarking

RAG-RL is evaluated on multi-hop open-domain question answering benchmarks, notably HotpotQA and MuSiQue. Metrics include Answer F1, passage-level citation F1, and joint F1 (requiring both correct answer and correct citations). In settings with many irrelevant distractors, RL- and curriculum-fine-tuned models significantly outperform supervised-only baselines, with findings such as:

  • On HotpotQA: Joint F1 rises from 45.6 (SFT) to 78.0 (RL Min-Max curriculum). On MuSiQue: from 25.6 (SFT) to 61.4 (RL Min-Max) (Huang et al., 17 Mar 2025).
  • Under ideal retriever conditions (only gold passages presented), joint F1 reaches 83.4 (HotpotQA) / 77.4 (MuSiQue), establishing new state of the art among generative readers for these datasets.

Ablation studies confirm that RL curriculum models are more robust as the number of distractors or hops increases; Min-Max and linear-shuffled curricula show best results.

5. Division of Contextual Responsibility: Shifting Selection to Generation

A central principle of RAG-RL is to shift a portion of context selection and relevance discrimination from the retriever to the generator. This enables the system to tolerate larger retrieved sets, increasing recall, while relying on the generator’s fine-grained, RL-optimized scoring to select and cite only the truly relevant passages during generation. This division allows models to (i) recover from imperfect retriever precision at scale, and (ii) efficiently leverage very large retrieval pools without collapsing in quality (Huang et al., 17 Mar 2025).

6. Broader Methodological and Practical Implications

The RAG-RL work demonstrates that rule-based reward design—emphasizing answer correctness, citation recall, and formatting—can be sufficient to train large-scale LLMs for context-faithful, multi-document reasoning using only post-hoc RL atop supervised checkpoints. Empirically, a curriculum that interleaves easy and hard examples can accelerate learning and fortify resistance to noise compared to strictly “easy to hard” schedules. The study also underscores the challenge of balancing input context diversity (for recall) against the need for robust in-context selection (to avoid distraction), a balance made achievable by joint RL and curriculum fine-tuning (Huang et al., 17 Mar 2025).

RAG-RL represents a significant evolution from baseline in-context RAG, where the generator is typically left to use whatever is retrieved, and supervision is provided only for answers (not for citation or reasoning steps). By contrast, RAG-RL explicitly encodes both citation metrics and answer F1 into the RL reward, and systematically explores how training data structure (curriculum) shapes downstream performance. This aligns with recent trends emphasizing model robustness to retrieval errors, optimizing generator–retriever interface, and integrating retrieval signals more tightly into the generation loop (Gupta et al., 2024).

8. Limitations and Future Research

While RAG-RL demonstrates strong gains, ultimate performance still depends on retriever recall, especially on long multi-hop chains. Further improvements could derive from end-to-end retriever–generator co-training, adaptive reward shaping, or hybrid systems with both explicit and learned evidence scoring. Additionally, the scope for richer reward functions (e.g., using learned citation verification, fine-grained passage reasoning fidelity) remains largely unexplored.


References

  • "RAG-RL: Advancing Retrieval-Augmented Generation via RL and Curriculum Learning" (Huang et al., 17 Mar 2025)
  • "A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions" (Gupta et al., 2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to In-Context Retrieval Augmented Generation (RAG).