The Troubling Emergence of Hallucination in Large Language Models -- An Extensive Definition, Quantification, and Prescriptive Remediations

Published 8 Oct 2023 in cs.AI | (2310.04988v2)

Abstract: The recent advancements in LLMs have garnered widespread acclaim for their remarkable emerging capabilities. However, the issue of hallucination has parallelly emerged as a by-product, posing significant concerns. While some recent endeavors have been made to identify and mitigate different types of hallucination, there has been a limited emphasis on the nuanced categorization of hallucination and associated mitigation methods. To address this gap, we offer a fine-grained discourse on profiling hallucination based on its degree, orientation, and category, along with offering strategies for alleviation. As such, we define two overarching orientations of hallucination: (i) factual mirage (FM) and (ii) silver lining (SL). To provide a more comprehensive understanding, both orientations are further sub-categorized into intrinsic and extrinsic, with three degrees of severity - (i) mild, (ii) moderate, and (iii) alarming. We also meticulously categorize hallucination into six types: (i) acronym ambiguity, (ii) numeric nuisance, (iii) generated golem, (iv) virtual voice, (v) geographic erratum, and (vi) time wrap. Furthermore, we curate HallucInation eLiciTation (HILT), a publicly available dataset comprising of 75,000 samples generated using 15 contemporary LLMs along with human annotations for the aforementioned categories. Finally, to establish a method for quantifying and to offer a comparative spectrum that allows us to evaluate and rank LLMs based on their vulnerability to producing hallucinations, we propose Hallucination Vulnerability Index (HVI). We firmly believe that HVI holds significant value as a tool for the wider NLP community, with the potential to serve as a rubric in AI-related policy-making. In conclusion, we propose two solution strategies for mitigating hallucinations.

Abstract PDF Upgrade to Chat

Citations (88)

View on Semantic Scholar

Summary

The paper presents a detailed taxonomy of hallucinations, introducing the Hallucination Vulnerability Index to measure LLM error severity.
It employs both black-box and gray-box strategies to mitigate errors, using targeted word replacement and factuality checks.
Findings indicate that larger, less refined LLMs are more prone to hallucination, underscoring the need for enhanced training protocols.

The Troubling Emergence of Hallucination in LLMs

Introduction to Hallucination in LLMs

The advent of LLMs such as GPT, DALL-E, and Stable Diffusion has been accompanied by significant challenges, notably the phenomenon known as "hallucination". Hallucinations refer to the generation of content by LLMs that is factually incorrect or unsubstantiated, resulting in outputs that deviate from real facts. This paper presents a comprehensive analysis of hallucination within LLMs, including a detailed taxonomy of hallucination types, the introduction of a Hallucination Vulnerability Index (HVI), and the proposal of mitigation strategies.

Figure 1: Hallucination: orientation, category, and degree (decreasing level of difficulty from top to bottom).

Taxonomy of Hallucination

The taxonomy proposed in the paper categorizes hallucinations into two primary orientations: Factual Mirage (FM) and Silver Lining (SL), each further divided into intrinsic and extrinsic sub-categories. Additionally, six specific types of hallucination are identified: Acronym Ambiguity, Numeric Nuisance, Generated Golem, Virtual Voice, Geographic Erratum, and Time Wrap.

Factual Mirage

Factual Mirage occurs when LLMs distort factually correct prompts. An intrinsic Factual Mirage includes adding tangential facts to an otherwise accurate response, while an extrinsic Factual Mirage refers to outputs that contradict the factual accuracy of the prompt.

Silver Lining

Silver Lining describes scenarios where LLMs respond convincingly to factually incorrect prompts. This includes intrinsic cases, where the model’s response lacks a convincing narrative, and extrinsic instances where it generates a detailed but incorrect story.

Hallucination Vulnerability Index (HVI)

The HVI is introduced as a quantifiable metric allowing the assessment and ranking of LLMs based on their susceptibility to generating hallucinations. The formula for HVI incorporates the frequency and severity of hallucinations, using damping factors to adjust for varying levels of distortion within generated content. The index provides a comparative analysis across different models, offering critical insights for policymakers and researchers.

Figure 2: HVI for different hallucination categories across various LLMs.

Mitigation Strategies

Two primary strategies for mitigating hallucinations are presented:

High Entropy Word Spotting and Replacement (ENTROPY\textsubscript{BB}): This black-box approach utilizes open-source LLMs to identify high entropy words within hallucinated text, replacing them with alternatives from other LLMs with a lower HVI. This method effectively reduces hallucinations related to Acronym Ambiguity and Numeric Nuisance.
Factuality Check of Sentences (FACTUALITY\textsubscript{GB}): As a gray-box approach, this method employs textual entailment techniques to validate AI-generated text against external factual databases, identifying content that requires human review.
Figure 3: A hallucination example pre- and post-mitigation.

Discussion

The findings underscore the complexity of fully mitigating hallucinations due to the inherent challenges within LLM architectures and the variability in their outputs. Larger LLMs without reinforcement learning from human feedback have shown a greater propensity for hallucination, highlighting an area for further refinement in model training processes.

Conclusion

The paper provides a foundational framework for understanding and addressing hallucination in LLMs, presenting both a robust taxonomy and practical tools like the HVI and mitigation strategies. The work underscores the importance of continued research into refining LLMs to ensure higher fidelity in generated content, alongside supportive regulatory measures informed by these insights. Future research is encouraged to advance hybrid mitigation methods, leveraging combined strengths of black-box and gray-box approaches.