Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism

Published 2 Nov 2023 in cs.CL and cs.AI | (2311.01041v4)

Abstract: LLMs have demonstrated impressive language understanding and generation capabilities, enabling them to answer a wide range of questions across various domains. However, these models are not flawless and often produce responses that contain errors or misinformation. These inaccuracies, commonly referred to as hallucinations, render LLMs unreliable and even unusable in many scenarios. In this paper, our focus is on mitigating the issue of hallucination in LLMs, particularly in the context of question-answering. Instead of attempting to answer all questions, we explore a refusal mechanism that instructs LLMs to refuse to answer challenging questions in order to avoid errors. We then propose a simple yet effective solution called Learn to Refuse (L2R), which incorporates the refusal mechanism to enable LLMs to recognize and refuse to answer questions that they find difficult to address. To achieve this, we utilize a structured knowledge base to represent all the LLM's understanding of the world, enabling it to provide traceable gold knowledge. This knowledge base is separate from the LLM and initially empty. It can be filled with validated knowledge and progressively expanded. When an LLM encounters questions outside its domain, the system recognizes its knowledge scope and determines whether it can answer the question independently. Additionally, we introduce a method for automatically and efficiently expanding the knowledge base of LLMs. Through qualitative and quantitative analysis, we demonstrate that our approach enhances the controllability and reliability of LLMs.

Abstract PDF HTML Upgrade to Chat

References (40)

Citations (8)

View on Semantic Scholar

Summary

The paper demonstrates that integrating structured knowledge limitation and dual-layer refusal significantly mitigates LLM hallucinations and improves QA accuracy.
The methodology utilizes a separate, validated knowledge base to ensure that only verifiable facts inform responses.
Experimental results reveal an accuracy boost from 46.6% to 65.1% on the MC1 task, highlighting the framework's practical benefits.

Learn to Refuse: Enhancing LLMs via Knowledge Limitation and Refusal

Introduction

In the paper "Learn to Refuse: Making LLMs More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism" (2311.01041), a novel approach termed Learn to Refuse (L2R) is introduced to enhance the reliability of LLMs in question-answering (QA) scenarios. By integrating a refusal mechanism, L2R aims to mitigate factual inconsistencies—commonly termed hallucinations—by enabling LLMs to identify and abstain from answering challenging queries. This paper proposes a structured knowledge base that is separate from the LLM's internal parameters, thus making the QA process more traceable and controlled.

Figure 1: The overview of L2R. L2R differs from traditional LLM-based QA systems that directly answer questions. It has the ability to refuse the user's question based on specific situations.

Methodology

The L2R framework consists of two core components: Knowledge Scope Limitation and the Refusal Mechanism.

Knowledge Scope Limitation: By utilizing an independent and structured knowledge base, L2R restricts the LLM's understanding to verified factual knowledge. The knowledge base starts devoid of information, allowing only validated and vetted facts to populate it progressively.
Refusal Mechanism: The refusal mechanism involves two layers of judgment:
- Soft Refusal: An internal assessment by the LLM, instructed via prompts, to determine answerability.
- Hard Refusal: A metric-based evaluation of the retrieved knowledge's confidence and relevance (similarity score), decided by a threshold $\alpha$ .

This dual-tier refusal mechanism ensures LLMs enhance response accuracy by declining to answer questions when either retrievals lack confidence or relevance.

Figure 2: The framework of L2R. L2R consists of two main components: manual or automatic knowledge enrichment and question answering based on structured knowledge.

Experimental Results

Quantitative Analysis: Experiments, primarily conducted on datasets like TruthfulQA, indicate that L2R achieves superior accuracy by refusing to answer a pre-determined percentage of questions. For instance, L2R improved the accuracy of gpt-3.5-turbo from 46.6% to 65.1% on the MC1 task by introducing hard and soft refusal mechanisms.

Qualitative Analysis: Several refusals from L2R in qualitative tests demonstrate that its refusal mechanism effectively identifies gaps in knowledge and saves it from potential inaccuracies.

Figure 3: The results of qualitative experiments. Red highlighted None indicates that the system has refused to answer the question based on its limited knowledge base.

Implementation Considerations

Resource Requirements and Scalability

Effective implementation of L2R requires embedding vectors for knowledge retrieval and the maintenance of a large-scale, structured database. While this paper utilizes a manageable sized knowledge base, scalability to millions of entries in real-world applications necessitates robust retrieval techniques like FAISS to ensure efficiency in querying.

Hyperparameter Tuning: The threshold $\alpha$ for the hard refusal mechanism plays a pivotal role in balancing answer accuracy with the count of refusals. Therefore, it requires careful tuning, typically based on the specific needs of the application scenario.

Figure 4: The changes of Refusal Number and Accuracy under the change of alpha.

Implications and Future Directions

The introduction of L2R provides a foundation in making LLMs more controllable and reliable by aligning answer quality with the verification of factual data. Future work may explore enhancements through more complex refusal function designs or employing improved retrieval algorithms. Additionally, extending these principles beyond QA to other NLP tasks like summarization and decision-making remains a promising avenue.

Conclusion

L2R offers a viable approach to addressing the hallucination problem in LLM-based systems by employing a novel refusal mechanism and knowledge scope limitation. The structured knowledge base not only enhances accuracy but also augments explainability, thus rendering LLMs more suitable for applications where factual integrity is paramount. The framework presents a potential direction for enhancing LLM reliability in a structured and scalable fashion.

Markdown Report Issue