Papers
Topics
Authors
Recent
Search
2000 character limit reached

LLM Hallucinations

Updated 14 February 2026
  • Hallucination in LLMs is a phenomenon where models produce fluent yet factually unsupported outputs in open-world settings.
  • Detection methodologies include retrieval-based, uncertainty-based, and consistency checks to identify and localize factual errors.
  • Mitigation strategies, such as RAG, prompt engineering, and model calibration, help reduce hallucination incidence and improve reliability.

LLMs exhibit striking linguistic competence but are fundamentally prone to producing outputs that are either inaccurate or fabricated—a phenomenon now termed hallucination. This covers generative text that is fluent and syntactically correct but factually unsupported, incorrect, or unverifiable. Hallucination is increasingly recognized as a structural feature of deep learning models—particularly under the open-world assumption, where models confront an unbounded, ever-evolving environment and must generalize far beyond finite training data. Understanding the origins, theoretical bounds, typologies, detection/mitigation strategies, and design implications of hallucination in LLMs is central to both practical system reliability and the broader pursuit of artificial general intelligence.

1. Foundational Definitions and Theoretical Bounds

In formal terms, let XX be the (possibly infinite) input space of token sequences, YY the output space, f:XYf:X\to Y the ground-truth world function, and f^:XY\hat{f}:X\to Y the mapping realized by an LLM post-training. Hallucination on input xXx\in X occurs iff f^(x)f(x)\hat{f}(x)\neq f(x) (Xu, 29 Sep 2025). The expected hallucination rate (or generalization error) under distribution DD is

LD(f^)=P(x,y)D[f^(x)y].L_D(\hat{f}) = \mathbb{P}_{(x,y)\sim D}\left[\hat{f}(x)\neq y\right].

Closed World Assumption: Dtrain=DtestD_\mathrm{train} = D_\mathrm{test}. In this regime, classical learning theory ensures that with enough data, hallucinations can be made arbitrarily rare.

Open World Assumption: DtestD_\mathrm{test} is not constrained to equal DtrainD_\mathrm{train}; new inputs and tasks are encountered after training. In this regime, the No Free Lunch theorem dictates that there exist functions ff for which f^(x)f(x)\hat{f}(x)\neq f(x) for infinitely many xx, and thus hallucinations are inevitable (Xu, 29 Sep 2025). No finite training set can immunize an LLM against hallucinations in an unbounded environment.

2. Hallucination Typologies: Structural, Empirical, and Categorical

2.1 Formal and Behavioral Taxonomies

  • Type I Hallucination (False Memorization): Occurs when xStrainx\in S_\mathrm{train} but f^(x)f(x)\hat{f}(x)\neq f(x). These are corrigible in principle by correcting the training set or fine-tuning (Xu, 29 Sep 2025).
  • Type II Hallucination (False Generalization): xStrainx\notin S_\mathrm{train} and f^(x)f(x)\hat{f}(x)\neq f(x). These reflect inevitable misgeneralization and cannot be eliminated under open-world conditions.

Parallel taxonomies refine these distinctions:

  • Intrinsic: Output contradicts facts in the prompt or provided context (faithfulness failure).
  • Extrinsic: Output introduces unsupported or fabricated information not entailed in the context (Alansari et al., 5 Oct 2025), often aligned with fabrication, imitative falsehood, or context drift (Cheng et al., 2023).

Fine-grained categories have also been articulated: acronym ambiguity, numeric nuisance, generated golem (fabricated entities), virtual voice (misattributed quotes), geographic erratum, and temporal displacement (Rawte et al., 2023).

2.2 High-Confidence Failures (Delusions)

A critical distinction is labeled as “delusion”: hallucinations generated with abnormal model confidence. Given a belief score b(y^)[0,1]b(\hat{y})\in[0,1], a hallucinated response y^\hat{y} is a delusion if b(y^)>τb(\hat{y}) > \tau, where τ\tau is the average belief over correct answers. Delusions are insensitive to supervision and self-reflection, and dominate error rates as model confidence calibration degrades (Xu et al., 9 Mar 2025).

2.3 Cross-Linguistic and Task Variants

Analysis in multilingual settings reveals that hallucination rates vary dramatically across languages, correlating with data/resource availability. Fact-conflicting hallucinations, especially in low-resource languages, are more prevalent and challenging to detect (Chataigner et al., 2024). In fields like vision-language modeling, hallucination spans additional axes: object, attribute, and relationship errors (Xiao et al., 2024).

3. Root Causes: Model, Data, and Contextual Factors

Hallucinations emerge throughout the LLM development pipeline (Alansari et al., 5 Oct 2025):

In NLI and QA settings, sentence-level memorization and learned corpus-level statistical patterns lead to both false entailment and factual hallucination even in deterministic setups (McKenna et al., 2023).

4. Detection Methodologies: Metrics and Pipelines

Detection strategies span several methodologies (Alansari et al., 5 Oct 2025, Zhang et al., 27 Dec 2025, Yang et al., 20 Feb 2025):

  • Retrieval-based: Compare LLM outputs to trusted knowledge bases or retrieved textual evidence. High-precision but computation-heavy, reliant on retriever quality.
  • Uncertainty-based: Token-level or semantic entropy signals, often calibrated with reference sets or model ensembles. These can be computed even in closed-box settings (Zhang et al., 27 Dec 2025, Pesaranghader et al., 14 Jan 2026).
  • Consistency/self-verification: Re-asking questions with paraphrased prompts or mutated templates (metamorphic relations). Violations of internal consistency (e.g., contradiction under synonymic or antonymic reformulation) flag hallucinations (Yang et al., 20 Feb 2025).
  • Learning-based classifiers: Supervised or semi-supervised discriminators trained to distinguish reliable vs. hallucinated outputs using embeddings, attention, and composite metrics (Chen et al., 2024, Zhang et al., 27 Dec 2025).
  • Multiple-testing hypothesis frameworks: Apply FDR-controlled conformal testing using multiple independent uncertainty or similarity scores, providing provable guarantees on false-positive rates (Li et al., 25 Aug 2025).

Segment-based (local) evaluation is critical in summarization and long-form generation, as hallucinations can concentrate in small spans while most of the text remains factual (Zhang et al., 27 Dec 2025, Xiao et al., 2024).

5. Mitigation and Management Strategies

No mitigation strategy can eliminate hallucinations in the open world, but multiple approaches substantially reduce their prevalence or impact:

6. Structural and Architectural Implications

Recent theoretical and empirical arguments converge on the view that hallucination is structurally inevitable in existing transformer-based LLMs (Xu, 29 Sep 2025, Ackermann et al., 19 Sep 2025). Because self-attention fields lack existential or temporal grounding, models default to generating fluent but unmoored continuations whenever world information is absent or ambiguous. Proposals for architectural innovation include:

  • Truth-constrained generation: Embedding explicit verification or abstention drives alongside traditional autoregressive continuation, permitting calibrated refusal when outputs cannot be grounded (Ackermann et al., 19 Sep 2025).
  • Embedding symbolic and causal constraints: Incorporating event order, causal relationship graphs, or affordance maps to shape possible continuations (Ackermann et al., 19 Sep 2025).
  • Hybrid parametric/non-parametric stacks: Combining parametric neural models with symbolic retrieval or logic-based modules for stronger factual alignment (Alansari et al., 5 Oct 2025).

Engineering for controllability, introspectability, and reliable uncertainty estimation is essential. Training separate hallucination detectors simply transfers the generalization problem unless detectors themselves are equipped with uncertainty awareness calibrated to the model’s knowledge boundaries (Xu, 29 Sep 2025).

7. Evaluation Benchmarks and Open Challenges

A diverse set of task-specific benchmarks now exist for measuring hallucination, with labeled datasets in QA (TruthfulQA, HaluEval, HalluQA), summarization (CNN/DailyMail, XSum, FactCC), dialogue (DialFact, WoW), and multilingual freeform generation (HalOmi, Mu-SHROOM) (Alansari et al., 5 Oct 2025, Cheng et al., 2023, Chataigner et al., 2024, Lee et al., 19 Feb 2025). Key challenges include:

Evaluation increasingly incorporates both automatic metrics (AUROC, F1, calibration error) and LLM/human-as-judge protocols for adjudicating factuality at scale. There is convergent emphasis on systemic, root cause–aware pipelines rather than one-off patches or ad hoc filters (Pesaranghader et al., 14 Jan 2026).


References

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hallucination in Large Language Models.