Papers
Topics
Authors
Recent
Search
2000 character limit reached

A comprehensive taxonomy of hallucinations in Large Language Models

Published 3 Aug 2025 in cs.CL and cs.AI | (2508.01781v1)

Abstract: LLMs have revolutionized natural language processing, yet their propensity for hallucination, generating plausible but factually incorrect or fabricated content, remains a critical challenge. This report provides a comprehensive taxonomy of LLM hallucinations, beginning with a formal definition and a theoretical framework that posits its inherent inevitability in computable LLMs, irrespective of architecture or training. It explores core distinctions, differentiating between intrinsic (contradicting input context) and extrinsic (inconsistent with training data or reality), as well as factuality (absolute correctness) and faithfulness (adherence to input). The report then details specific manifestations, including factual errors, contextual and logical inconsistencies, temporal disorientation, ethical violations, and task-specific hallucinations across domains like code generation and multimodal applications. It analyzes the underlying causes, categorizing them into data-related issues, model-related factors, and prompt-related influences. Furthermore, the report examines cognitive and human factors influencing hallucination perception, surveys evaluation benchmarks and metrics for detection, and outlines architectural and systemic mitigation strategies. Finally, it introduces web-based resources for monitoring LLM releases and performance. This report underscores the complex, multifaceted nature of LLM hallucinations and emphasizes that, given their theoretical inevitability, future efforts must focus on robust detection, mitigation, and continuous human oversight for responsible and reliable deployment in critical applications.

Summary

  • The paper introduces a formal proof that hallucinations in any computable LLM are inevitable, using computability and diagonalization arguments.
  • The paper proposes a dual-axis taxonomy distinguishing intrinsic/extrinsic and factuality/faithfulness hallucinations to enable standardized evaluation.
  • The analysis identifies data, model, and prompt factors as causes and suggests layered mitigation strategies and human oversight for safe deployment.

A Comprehensive Taxonomy of Hallucinations in LLMs

Introduction and Motivation

The paper "A comprehensive taxonomy of hallucinations in LLMs" (2508.01781) presents a systematic and theoretically grounded analysis of hallucinations in LLMs, addressing both their formal inevitability and their diverse empirical manifestations. The work synthesizes formal definitions, taxonomic frameworks, empirical typologies, causal analyses, evaluation methodologies, and mitigation strategies, providing a unified reference for researchers and practitioners concerned with the reliability and safety of LLM deployments.

Formal Definition and Theoretical Inevitability

A central contribution is the formalization of hallucination as an inconsistency between a computable LLM hh and a computable ground truth function ff, within a formal world Gf={(s,f(s))sS}G_f = \{(s, f(s)) | s \in S\}. The paper leverages computability theory, specifically diagonalization arguments, to prove that for any computable LLM, there exists a computable ff such that the LLM will hallucinate on at least one, and in fact infinitely many, inputs. This result is architecture-agnostic and holds regardless of training data, learning algorithm, or prompting strategy. The corollary is that no computable LLM can self-eliminate hallucination, and thus, hallucination is not a removable artifact but an intrinsic limitation of the LLM paradigm.

This theoretical inevitability has direct implications for deployment: LLMs cannot be trusted as autonomous agents in safety-critical or high-stakes domains without external verification, guardrails, or human oversight.

Core Taxonomies: Intrinsic/Extrinsic and Factuality/Faithfulness

The paper delineates two orthogonal axes for classifying hallucinations:

  • Intrinsic vs. Extrinsic: Intrinsic hallucinations contradict the provided input or context (e.g., internal logical inconsistencies or misrepresentation of source material), while extrinsic hallucinations introduce content unsupported or refuted by the input or training data (e.g., fabricated entities or events).
  • Factuality vs. Faithfulness: Factuality hallucinations are factually incorrect with respect to external reality or knowledge bases, whereas faithfulness hallucinations diverge from the input prompt or context, regardless of external truth.

These axes are not mutually exclusive and often overlap in real-world cases. The lack of a unified taxonomy in the literature is identified as a barrier to standardized evaluation and mitigation.

Specific Manifestations and Task-Specific Hallucinations

The taxonomy is further refined into concrete categories, including:

  • Factual errors and fabrications: Incorrect facts, invented entities, or fabricated citations.
  • Contextual inconsistencies: Contradictions or unsupported additions relative to the input.
  • Instruction deviation: Failure to follow explicit user instructions.
  • Logical inconsistencies: Internal contradictions or reasoning errors.
  • Temporal disorientation: Outdated or anachronistic information.
  • Ethical violations: Harmful, defamatory, or legally incorrect outputs.
  • Amalgamated hallucinations: Erroneous blending of multiple facts.
  • Nonsensical responses: Irrelevant or incoherent outputs.
  • Task-specific: Hallucinations in code generation, multimodal reasoning, dialogue, summarization, and QA.

The diversity of these manifestations underscores the need for domain- and task-specific detection and mitigation strategies.

Underlying Causes: Data, Model, and Prompt Factors

The paper provides a granular analysis of the root causes of hallucination:

  • Data-related: Noisy, biased, or outdated training data; insufficient representation; source-reference divergence.
  • Model-related: Auto-regressive generation, exposure bias, capability and belief misalignment, over-optimization, stochastic decoding, overconfidence, generalization failure, reasoning limitations, knowledge overshadowing, and extraction failures.
  • Prompt-related: Adversarial attacks, confirmatory bias, and poor prompting.

The emergent property of hallucination is attributed to the statistical, auto-regressive nature of LLMs, which optimize for plausible token sequences rather than factual or logical correctness.

Human Factors and Cognitive Biases

The perception and impact of hallucinations are modulated by human factors:

  • User trust and interpretability: Fluency and confidence in LLM outputs can mislead users into overtrusting hallucinated content.
  • Cognitive biases: Automation bias, confirmation bias, and the illusion of explanatory depth increase susceptibility to hallucinations, even when users are warned of potential errors.
  • Design implications: Calibrated uncertainty displays, source-grounding indicators, justification prompts, and factuality-aware interfaces are recommended to enhance user resilience and support human-in-the-loop oversight.

Evaluation Benchmarks and Metrics

The paper surveys principal benchmarks and metrics for hallucination detection:

  • Benchmarks: TruthfulQA, HalluLens, FActScore, Q2, QuestEval, and domain-specific datasets (e.g., MedHallu, CodeHaluEval, HALLUCINOGEN).
  • Metrics: ROUGE, BLEU, BERTScore (surface/semantic similarity); FactCC, SummaC (entailment/NLI-based); KILT, RAE (knowledge-grounded); and human evaluation (correctness, faithfulness, coherence, harmfulness).
  • Limitations: Lack of standardization, task dependence, insensitivity to subtle hallucinations, and limited explainability.

The need for unified, taxonomy-aware, and context-sensitive evaluation frameworks is emphasized.

Mitigation Strategies: Architectural and Systemic

Mitigation is addressed at both the model and system levels:

  • Architectural: Toolformer-style augmentation (external tool/API calls), retrieval-augmented generation (RAG), fine-tuning with synthetic or adversarially filtered data.
  • Systemic: Guardrails (logic validators, factual filters, rule-based fallbacks), symbolic integration, and hybrid context-aware systems.

No single technique suffices; layered, hybrid approaches tailored to application context are advocated.

Monitoring and Benchmarking Resources

The paper highlights web-based resources for tracking LLM performance and hallucination trends:

  • Artificial Analysis: Intelligence, cost, latency, and multimodal benchmarks. Figure 1

    Figure 1: Sample visualization of the AI Index, retrieved on 28 June 2025.

    Figure 2

    Figure 2: Sample visualization of intelligence versus price, retrieved on 28 June 2025.

    Figure 3

    Figure 3: Sample visualization of latency, retrieved on 28 June 2025.

  • Vectara Hallucination Leaderboard: Explicit tracking of hallucination rates in summarization. Figure 4

    Figure 4: Sample visualization of grounded hallucinations rate using Hughes hallucination evaluation model, retrieved on 29 June 2025.

  • Epoch AI Benchmarking Dashboard: Longitudinal trends in accuracy, compute, and open/proprietary model performance. Figure 5

    Figure 5: Sample visualization of accuracy versus training compute, retrieved on 29 June 2025.

    Figure 6

    Figure 6: Sample visualization of models with downloadable weights vs proprietary, retrieved on 29 June 2025.

    Figure 7

    Figure 7: Sample visualization of US models vs non-US, retrieved on 29 June 2025.

    Figure 8

    Figure 8: Sample visualization of models performance on expert-level mathematics problems, retrieved on 29 June 2025.

  • LM Arena: Community-driven, real-world preference-based evaluation of model helpfulness and trustworthiness. Figure 9

Figure 9

Figure 9: Sample visualization of models performance on text generation, retrieved on 9 July 2025.

Figure 10

Figure 10

Figure 10: Sample visualization of models performance on generative AI models capable of understanding and processing visual inputs, retrieved on 9 July 2025.

These resources facilitate transparent, reproducible, and up-to-date monitoring of LLM capabilities and hallucination risks.

Conclusion

The inevitability and multifaceted nature of hallucinations in LLMs, as rigorously established in this work, necessitate a paradigm shift from elimination to robust detection, mitigation, and human-centered oversight. The formal proofs of inevitability, combined with the detailed empirical taxonomy and causal analysis, provide a foundation for future research on reliable LLM deployment. Progress will depend on the development of unified taxonomies, context-aware evaluation frameworks, hybrid mitigation architectures, and user-centric interface designs. In high-stakes domains, continuous human-in-the-loop validation and external safeguards are essential. The paper's synthesis of theoretical, empirical, and practical perspectives offers a comprehensive roadmap for advancing the safety and trustworthiness of LLMs in real-world applications.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

What is this paper about?

This paper explains why LLMs—the AI systems behind tools like chatbots—sometimes “hallucinate.” In AI, hallucination means the model confidently says things that sound right but are actually wrong or made up. The paper builds a clear map (a taxonomy) of different kinds of hallucinations, why they happen, how to check for them, and what we can do to reduce them. It also argues that some hallucinations are impossible to fully eliminate, no matter how good the model is.

What questions does the paper try to answer?

  • What exactly is an AI “hallucination,” and how should we define it?
  • Are hallucinations rare mistakes, or are they unavoidable?
  • What kinds of hallucinations exist, and how can we tell them apart?
  • What causes them—bad data, model design, the user’s prompt, or something else?
  • How can we detect, measure, and reduce hallucinations in practice?

How did the author study the problem?

The paper isn’t a single experiment. It’s a comprehensive review and framework that does three things:

  • It introduces a formal, math-based argument that hallucinations are unavoidable for any realistic LLM. Think of it as a proof that some errors will always slip through.
  • It organizes (taxonomizes) the different kinds of hallucinations into clear categories with examples.
  • It surveys causes, human factors (like trust and bias), tests and benchmarks, and practical fixes (like grounding answers in reliable sources and adding safety guardrails).

If “formal proof” sounds abstract, imagine a game where you try to write a rulebook that covers every possible tricky question. The paper argues that for any rulebook an LLM can follow, someone can still craft a question that makes it slip up. It’s like an unbeatable game of whack-a-mole: you can reduce mistakes, but you can’t remove them all.

What did the paper find?

1) Some level of hallucination is inevitable

Using ideas from computer science, the paper shows that for any computable LLM (which includes real-world models), there will always be questions that make it answer incorrectly. Not only that, but there can be infinitely many such questions. This means:

  • No model can guarantee 100% truthfulness on everything.
  • Models can’t “self-fix” hallucinations completely just by thinking harder.
  • Safety-critical uses (like medical or legal decisions) must include external checks and human oversight.

Why this matters: It shifts the goal from “eliminate all hallucinations” to “detect, limit, and manage them reliably.”

2) There are two key ways to classify hallucinations

To talk clearly about hallucinations, the paper separates them into simple, easy-to-spot types:

  • Intrinsic vs. Extrinsic
    • Intrinsic: The answer contradicts the given text or context. Example: The source says “approved,” but the summary says “rejected.”
    • Extrinsic: The answer invents details not supported by the context or reality. Example: Making up a fake animal or event.
  • Factuality vs. Faithfulness
    • Factuality: Is the answer true in the real world?
    • Faithfulness: Does the answer stick to the user’s prompt or the provided source?

These pairs overlap but help different teams (engineers, evaluators, users) discuss problems precisely.

3) Common ways hallucinations show up

Here are typical patterns the paper highlights:

  • Factual errors and fabrications: Wrong dates, fake citations, made-up facts.
  • Contextual mistakes: Adding details not in the source or contradicting it.
  • Instruction-following errors: Ignoring what the user asked (e.g., wrong language or format).
  • Logical errors: Step-by-step reasoning that falls apart midway.
  • Time mistakes: Outdated info or wrong timelines.
  • Ethical/legal harms: Defamation, dangerous advice, or bogus legal cases.
  • Task-specific issues: Code that looks right but fails; vision-LLMs naming objects that aren’t in the image; chatbots confusing people or events across a long conversation.

4) Why do hallucinations happen?

In short, LLMs are super-powered autocomplete. They predict the next word based on patterns they’ve seen, not because they truly “understand” the world. Several ingredients matter:

  • Data problems: Noisy, biased, incomplete, or outdated training data.
  • Model design: The auto-regressive setup (predicting one token at a time), reasoning limits, and overconfidence.
  • Decoding randomness: Settings that boost creativity can also increase errors.
  • Prompts: Tricky or misleading prompts (accidentally or on purpose) can push the model to make things up.
  • Human factors: People may trust confident-sounding text, even when it’s wrong.

5) How do we test and reduce hallucinations?

The paper reviews benchmarks and metrics used to detect hallucinations and discusses practical ways to cut them down:

  • Grounding the model: Use Retrieval-Augmented Generation (RAG) to pull in up-to-date, trustworthy sources during answering.
  • Tools and external systems: Let the model call calculators, databases, code runners, or search tools.
  • Guardrails and policies: Add filters, constraints, and safety checks before answers reach the user.
  • Better prompts and examples: Clear instructions and in-context examples reduce confusion.
  • Human oversight: Experts review answers, especially in sensitive domains.

No single fix works for everything; the best systems combine several of these.

Why does this matter, and what’s the impact?

As LLMs spread into schools, hospitals, courts, and businesses, their mistakes can have real consequences—misinformation, risky medical claims, financial losses, and reputational harm. This paper’s core message is realistic and responsible: total perfection isn’t possible, so aim for strong defenses. That means:

  • Design systems that check facts against reliable sources.
  • Keep humans in the loop for important decisions.
  • Use clear categories to diagnose what went wrong and improve targeted fixes.
  • Build better tests and shared definitions so the field can compare models fairly and improve faster.

In short: LLMs are powerful but imperfect. Treat them like skilled assistants who still need supervision, not like all-knowing authorities.

Open Problems

We found no open problems mentioned in this paper.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 30 tweets with 133347 likes about this paper.