Context-Informed Grounding Supervision

Published 18 Jun 2025 in cs.CL and cs.AI | (2506.15480v1)

Abstract: LLMs are often supplemented with external knowledge to provide information not encoded in their parameters or to reduce hallucination. In such cases, we expect the model to generate responses by grounding its response in the provided external context. However, prior work has shown that simply appending context at inference time does not ensure grounded generation. To address this, we propose Context-INformed Grounding Supervision (CINGS), a post-training supervision in which the model is trained with relevant context prepended to the response, while computing the loss only over the response tokens and masking out the context. Our experiments demonstrate that models trained with CINGS exhibit stronger grounding in both textual and visual domains compared to standard instruction-tuned models. In the text domain, CINGS outperforms other training methods across 11 information-seeking datasets and is complementary to inference-time grounding techniques. In the vision-language domain, replacing a vision-LLM's LLM backbone with a CINGS-trained model reduces hallucinations across four benchmarks and maintains factual consistency throughout the generated response. This improved grounding comes without degradation in general downstream performance. Finally, we analyze the mechanism underlying the enhanced grounding in CINGS and find that it induces a shift in the model's prior knowledge and behavior, implicitly encouraging greater reliance on the external context.

Abstract PDF Upgrade to Chat

Summary

The paper introduces CInGS, a technique that trains LLMs to integrate external context during instruction tuning to reduce hallucinations.
The methodology adapts standard instruction tuning by prepending context to responses and masking it during loss computation, yielding an average 5.5% improvement on text benchmarks.
Applying CInGS to vision-language models results in reduced hallucinations and enhanced factual consistency even in challenging, distracting contexts.

Context-Informed Grounding Supervision

The paper "Context-Informed Grounding Supervision" explores a new approach for training LLMs to enhance their ability to generate contextually grounded responses. This approach, termed as Context-Informed Grounding Supervision (CInGS), addresses a well-documented limitation of LLMs: their propensity to hallucinate or produce responses based on incorrect internal knowledge, particularly when provided with external context.

This study builds on existing work that seeks to overcome the challenges of LLM hallucinations and the lack of controllability by integrating external knowledge. The authors acknowledge that merely appending relevant external context to a model's input at inference time does not guarantee the desired response grounding. Previous attempts to address this have included modifying decoding strategies, developing auxiliary modules for correction, and creating knowledge integration pipelines. However, less attention has been paid to training the LLM itself to naturally incorporate and prioritize external context during both training and inference.

To tackle these challenges, the authors propose CInGS, which involves a straightforward adaptation of the standard instruction tuning process. Usually, an LLM is trained to produce a response directly based on input instructions. In contrast, CInGS augments this approach by prepending relevant external context to the expected response during training, yet computing the loss only over the response tokens, while masking out the context. This design aims to reinforce the model's reliance on external information without altering general downstream performance.

In empirical testing, CInGS demonstrated superior grounding across both text and vision-language domains. In the text domain, CInGS outperformed traditional models across 11 information-seeking datasets, yielding an average absolute improvement of 5.5% and showing particularly significant gains in settings that critically demand context usage. This also surpasses the results achieved by other grounding training methods such as Self-RAG and FactTune. Moreover, when applied alongside inference-time grounding techniques like AdaCAD and CORG, CInGS brought about further performance enhancements, proving its complementary nature.

The study also highlights CInGS's value in vision-LLMs (VLMs). Model replacements with a CInGS-trained backbone in VLMs led to reduced hallucinations and better factual consistency, especially noted in benchmarks focused on hallucination detection. Notably, CInGS-trained VLMs maintained robust performance even in scenarios with potentially distracting information, and proved more adept at sustaining factual accuracy throughout an entire response, a common weakness of standard models.

The paper's analysis attributes CInGS's effectiveness to a dual mechanism: the model's reduced reliance on outdated internal knowledge, prompted by its training with relevant context, and an implicit shift in its behavior to prioritize new input context during response generation. This paradigm results in a tendency to gradually forget prior incorrect knowledge, creatively balancing between parametric recall and context grounding. Additionally, the attention analysis reveals that CInGS-trained models naturally focus more on external context rather than self-generated responses, reinforcing its contextual attention.

In conclusion, the introduction of CInGS marks a significant advancement in the grounding capabilities of LLMs. It offers a practical and scalable solution that aligns with the ongoing efforts to minimize hallucinations by leveraging external knowledge effectively. While its integration into vision-language scenarios opens new pathways for enhancing multimodal understanding, the core benefit of CInGS lies in its ability to integrate seamlessly with existing and future methods aimed at improving contextual responsiveness in AI models, without sacrificing general language understanding. The future may see further refinement and scaling of CInGS to leverage its benefits across a broader spectrum of AI tasks and applications.

Markdown Report Issue