Papers
Topics
Authors
Recent
Search
2000 character limit reached

Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism

Published 2 Nov 2023 in cs.CL and cs.AI | (2311.01041v4)

Abstract: LLMs have demonstrated impressive language understanding and generation capabilities, enabling them to answer a wide range of questions across various domains. However, these models are not flawless and often produce responses that contain errors or misinformation. These inaccuracies, commonly referred to as hallucinations, render LLMs unreliable and even unusable in many scenarios. In this paper, our focus is on mitigating the issue of hallucination in LLMs, particularly in the context of question-answering. Instead of attempting to answer all questions, we explore a refusal mechanism that instructs LLMs to refuse to answer challenging questions in order to avoid errors. We then propose a simple yet effective solution called Learn to Refuse (L2R), which incorporates the refusal mechanism to enable LLMs to recognize and refuse to answer questions that they find difficult to address. To achieve this, we utilize a structured knowledge base to represent all the LLM's understanding of the world, enabling it to provide traceable gold knowledge. This knowledge base is separate from the LLM and initially empty. It can be filled with validated knowledge and progressively expanded. When an LLM encounters questions outside its domain, the system recognizes its knowledge scope and determines whether it can answer the question independently. Additionally, we introduce a method for automatically and efficiently expanding the knowledge base of LLMs. Through qualitative and quantitative analysis, we demonstrate that our approach enhances the controllability and reliability of LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Improving language models by retrieving from trillions of tokens.
  2. Instruction mining: High-quality instruction data selection for large language models.
  3. Chain-of-verification reduces hallucination in large language models.
  4. Wikimedia Foundation. Wikimedia downloads.
  5. Realm: Retrieval-augmented language model pre-training.
  6. Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.
  7. What disease does this patient have? a large-scale open domain question answering dataset from medical exams.
  8. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547.
  9. Language models (mostly) know what they know.
  10. Challenges and applications of large language models.
  11. Generalization through memorization: Nearest neighbor language models. In International Conference on Learning Representations.
  12. Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. In Proceedings of the First Workshop on Neural Machine Translation, pages 28–39, Vancouver. Association for Computational Linguistics.
  13. Internet-augmented language models through few-shot prompting for open-domain question answering.
  14. Factuality enhanced language models for open-ended text generation.
  15. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems, volume 33, pages 9459–9474. Curran Associates, Inc.
  16. Retrieval-augmented generation for knowledge-intensive nlp tasks.
  17. A survey on retrieval-augmented text generation.
  18. Inference-time intervention: Eliciting truthful answers from a language model.
  19. Let’s verify step by step.
  20. Truthfulqa: Measuring how models mimic human falsehoods.
  21. TruthfulQA: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3214–3252, Dublin, Ireland. Association for Computational Linguistics.
  22. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models.
  23. On faithfulness and factuality in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1906–1919, Online. Association for Computational Linguistics.
  24. OpenAI. 2023. Gpt-4 technical report.
  25. The refinedweb dataset for falcon llm: Outperforming curated corpora with web data, and web data only.
  26. Check your facts and try again: Improving large language models with external knowledge and automated feedback.
  27. The curious case of hallucinations in neural machine translation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1172–1183, Online. Association for Computational Linguistics.
  28. Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.
  29. Language models that seek for knowledge: Modular search & generation for dialogue and prompt completion. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 373–393, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  30. Aligning large multimodal models with factually augmented rlhf.
  31. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4149–4158, Minneapolis, Minnesota. Association for Computational Linguistics.
  32. Llama 2: Open foundation and fine-tuned chat models.
  33. Med-halt: Medical domain hallucination test for large language models.
  34. Chain-of-thought prompting elicits reasoning in large language models.
  35. Fine-grained human feedback gives better rewards for language model training.
  36. Benchmarking retrieval-augmented generation for medicine.
  37. Do large language models know what they don’t know? In Findings of the Association for Computational Linguistics: ACL 2023, pages 8653–8665, Toronto, Canada. Association for Computational Linguistics.
  38. Siren’s song in the ai ocean: A survey on hallucination in large language models.
  39. A survey of large language models.
  40. Lima: Less is more for alignment.
Citations (8)

Summary

  • The paper demonstrates that integrating structured knowledge limitation and dual-layer refusal significantly mitigates LLM hallucinations and improves QA accuracy.
  • The methodology utilizes a separate, validated knowledge base to ensure that only verifiable facts inform responses.
  • Experimental results reveal an accuracy boost from 46.6% to 65.1% on the MC1 task, highlighting the framework's practical benefits.

Learn to Refuse: Enhancing LLMs via Knowledge Limitation and Refusal

Introduction

In the paper "Learn to Refuse: Making LLMs More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism" (2311.01041), a novel approach termed Learn to Refuse (L2R) is introduced to enhance the reliability of LLMs in question-answering (QA) scenarios. By integrating a refusal mechanism, L2R aims to mitigate factual inconsistencies—commonly termed hallucinations—by enabling LLMs to identify and abstain from answering challenging queries. This paper proposes a structured knowledge base that is separate from the LLM's internal parameters, thus making the QA process more traceable and controlled. Figure 1

Figure 1: The overview of L2R. L2R differs from traditional LLM-based QA systems that directly answer questions. It has the ability to refuse the user's question based on specific situations.

Methodology

The L2R framework consists of two core components: Knowledge Scope Limitation and the Refusal Mechanism.

  1. Knowledge Scope Limitation: By utilizing an independent and structured knowledge base, L2R restricts the LLM's understanding to verified factual knowledge. The knowledge base starts devoid of information, allowing only validated and vetted facts to populate it progressively.
  2. Refusal Mechanism: The refusal mechanism involves two layers of judgment:
    • Soft Refusal: An internal assessment by the LLM, instructed via prompts, to determine answerability.
    • Hard Refusal: A metric-based evaluation of the retrieved knowledge's confidence and relevance (similarity score), decided by a threshold α\alpha.

This dual-tier refusal mechanism ensures LLMs enhance response accuracy by declining to answer questions when either retrievals lack confidence or relevance. Figure 2

Figure 2: The framework of L2R. L2R consists of two main components: manual or automatic knowledge enrichment and question answering based on structured knowledge.

Experimental Results

Quantitative Analysis: Experiments, primarily conducted on datasets like TruthfulQA, indicate that L2R achieves superior accuracy by refusing to answer a pre-determined percentage of questions. For instance, L2R improved the accuracy of gpt-3.5-turbo from 46.6% to 65.1% on the MC1 task by introducing hard and soft refusal mechanisms.

Qualitative Analysis: Several refusals from L2R in qualitative tests demonstrate that its refusal mechanism effectively identifies gaps in knowledge and saves it from potential inaccuracies. Figure 3

Figure 3: The results of qualitative experiments. Red highlighted None indicates that the system has refused to answer the question based on its limited knowledge base.

Implementation Considerations

Resource Requirements and Scalability

Effective implementation of L2R requires embedding vectors for knowledge retrieval and the maintenance of a large-scale, structured database. While this paper utilizes a manageable sized knowledge base, scalability to millions of entries in real-world applications necessitates robust retrieval techniques like FAISS to ensure efficiency in querying.

Hyperparameter Tuning: The threshold α\alpha for the hard refusal mechanism plays a pivotal role in balancing answer accuracy with the count of refusals. Therefore, it requires careful tuning, typically based on the specific needs of the application scenario. Figure 4

Figure 4: The changes of Refusal Number and Accuracy under the change of alpha.

Implications and Future Directions

The introduction of L2R provides a foundation in making LLMs more controllable and reliable by aligning answer quality with the verification of factual data. Future work may explore enhancements through more complex refusal function designs or employing improved retrieval algorithms. Additionally, extending these principles beyond QA to other NLP tasks like summarization and decision-making remains a promising avenue.

Conclusion

L2R offers a viable approach to addressing the hallucination problem in LLM-based systems by employing a novel refusal mechanism and knowledge scope limitation. The structured knowledge base not only enhances accuracy but also augments explainability, thus rendering LLMs more suitable for applications where factual integrity is paramount. The framework presents a potential direction for enhancing LLM reliability in a structured and scalable fashion.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.