LLM Hallucinations
- Hallucination in LLMs is a phenomenon where models produce fluent yet factually unsupported outputs in open-world settings.
- Detection methodologies include retrieval-based, uncertainty-based, and consistency checks to identify and localize factual errors.
- Mitigation strategies, such as RAG, prompt engineering, and model calibration, help reduce hallucination incidence and improve reliability.
LLMs exhibit striking linguistic competence but are fundamentally prone to producing outputs that are either inaccurate or fabricated—a phenomenon now termed hallucination. This covers generative text that is fluent and syntactically correct but factually unsupported, incorrect, or unverifiable. Hallucination is increasingly recognized as a structural feature of deep learning models—particularly under the open-world assumption, where models confront an unbounded, ever-evolving environment and must generalize far beyond finite training data. Understanding the origins, theoretical bounds, typologies, detection/mitigation strategies, and design implications of hallucination in LLMs is central to both practical system reliability and the broader pursuit of artificial general intelligence.
1. Foundational Definitions and Theoretical Bounds
In formal terms, let be the (possibly infinite) input space of token sequences, the output space, the ground-truth world function, and the mapping realized by an LLM post-training. Hallucination on input occurs iff (Xu, 29 Sep 2025). The expected hallucination rate (or generalization error) under distribution is
Closed World Assumption: . In this regime, classical learning theory ensures that with enough data, hallucinations can be made arbitrarily rare.
Open World Assumption: is not constrained to equal ; new inputs and tasks are encountered after training. In this regime, the No Free Lunch theorem dictates that there exist functions for which for infinitely many , and thus hallucinations are inevitable (Xu, 29 Sep 2025). No finite training set can immunize an LLM against hallucinations in an unbounded environment.
2. Hallucination Typologies: Structural, Empirical, and Categorical
2.1 Formal and Behavioral Taxonomies
- Type I Hallucination (False Memorization): Occurs when but . These are corrigible in principle by correcting the training set or fine-tuning (Xu, 29 Sep 2025).
- Type II Hallucination (False Generalization): and . These reflect inevitable misgeneralization and cannot be eliminated under open-world conditions.
Parallel taxonomies refine these distinctions:
- Intrinsic: Output contradicts facts in the prompt or provided context (faithfulness failure).
- Extrinsic: Output introduces unsupported or fabricated information not entailed in the context (Alansari et al., 5 Oct 2025), often aligned with fabrication, imitative falsehood, or context drift (Cheng et al., 2023).
Fine-grained categories have also been articulated: acronym ambiguity, numeric nuisance, generated golem (fabricated entities), virtual voice (misattributed quotes), geographic erratum, and temporal displacement (Rawte et al., 2023).
2.2 High-Confidence Failures (Delusions)
A critical distinction is labeled as “delusion”: hallucinations generated with abnormal model confidence. Given a belief score , a hallucinated response is a delusion if , where is the average belief over correct answers. Delusions are insensitive to supervision and self-reflection, and dominate error rates as model confidence calibration degrades (Xu et al., 9 Mar 2025).
2.3 Cross-Linguistic and Task Variants
Analysis in multilingual settings reveals that hallucination rates vary dramatically across languages, correlating with data/resource availability. Fact-conflicting hallucinations, especially in low-resource languages, are more prevalent and challenging to detect (Chataigner et al., 2024). In fields like vision-language modeling, hallucination spans additional axes: object, attribute, and relationship errors (Xiao et al., 2024).
3. Root Causes: Model, Data, and Contextual Factors
Hallucinations emerge throughout the LLM development pipeline (Alansari et al., 5 Oct 2025):
- Model-centric factors: Transformer autoregressive decoding, self-attention’s lack of existential/temporal grounding (Ackermann et al., 19 Sep 2025), softmax bottlenecks, shortcut correlations in high-dimensional token spaces.
- Data-centric factors: Web-corpus biases, knowledge conflicts, out-of-date information, and long-tail sparsity results in overgeneralization or imitative falsehoods (Alansari et al., 5 Oct 2025, Cheng et al., 2023).
- Contextual/inference factors: Prompt ambiguity, domain/task distribution shift, retrieval pipeline inconsistencies in RAG setups, and exposure bias in autoregressive decoding (Alansari et al., 5 Oct 2025, Pesaranghader et al., 14 Jan 2026).
In NLI and QA settings, sentence-level memorization and learned corpus-level statistical patterns lead to both false entailment and factual hallucination even in deterministic setups (McKenna et al., 2023).
4. Detection Methodologies: Metrics and Pipelines
Detection strategies span several methodologies (Alansari et al., 5 Oct 2025, Zhang et al., 27 Dec 2025, Yang et al., 20 Feb 2025):
- Retrieval-based: Compare LLM outputs to trusted knowledge bases or retrieved textual evidence. High-precision but computation-heavy, reliant on retriever quality.
- Uncertainty-based: Token-level or semantic entropy signals, often calibrated with reference sets or model ensembles. These can be computed even in closed-box settings (Zhang et al., 27 Dec 2025, Pesaranghader et al., 14 Jan 2026).
- Consistency/self-verification: Re-asking questions with paraphrased prompts or mutated templates (metamorphic relations). Violations of internal consistency (e.g., contradiction under synonymic or antonymic reformulation) flag hallucinations (Yang et al., 20 Feb 2025).
- Learning-based classifiers: Supervised or semi-supervised discriminators trained to distinguish reliable vs. hallucinated outputs using embeddings, attention, and composite metrics (Chen et al., 2024, Zhang et al., 27 Dec 2025).
- Multiple-testing hypothesis frameworks: Apply FDR-controlled conformal testing using multiple independent uncertainty or similarity scores, providing provable guarantees on false-positive rates (Li et al., 25 Aug 2025).
Segment-based (local) evaluation is critical in summarization and long-form generation, as hallucinations can concentrate in small spans while most of the text remains factual (Zhang et al., 27 Dec 2025, Xiao et al., 2024).
5. Mitigation and Management Strategies
No mitigation strategy can eliminate hallucinations in the open world, but multiple approaches substantially reduce their prevalence or impact:
- Retrieval-augmented generation (RAG): Fact grounding via external document integration at inference time reduces both ordinary hallucinations and delusions (Xu et al., 9 Mar 2025, Alansari et al., 5 Oct 2025).
- Prompt engineering: Instruction layering, in-context learning, chain-of-thought prompting, self-consistency voting, and explicit abstention instruction (refusal training) (Alansari et al., 5 Oct 2025, Ji et al., 2023).
- Direct preference optimization (DPO), RLHF, and contrastive fine-tuning: Preference datasets and structured loss objectives penalize unsupported generation, often with severity weighting (Xiao et al., 2024).
- Model calibration: Temperature scaling, isotonic regression, and Bayesian post-hoc calibration to align model confidence with factual accuracy (Pesaranghader et al., 14 Jan 2026).
- External consensus and multi-agent pipelines: Multi-model debate/voting architectures significantly reduce the incidence of persistent, high-confidence hallucinations (Xu et al., 9 Mar 2025).
- Entropy- and evidence-guided regeneration: Automatic identification and rewriting of high-entropy or ungrounded spans using lower-vulnerability models or fact-checkers (Rawte et al., 2023).
6. Structural and Architectural Implications
Recent theoretical and empirical arguments converge on the view that hallucination is structurally inevitable in existing transformer-based LLMs (Xu, 29 Sep 2025, Ackermann et al., 19 Sep 2025). Because self-attention fields lack existential or temporal grounding, models default to generating fluent but unmoored continuations whenever world information is absent or ambiguous. Proposals for architectural innovation include:
- Truth-constrained generation: Embedding explicit verification or abstention drives alongside traditional autoregressive continuation, permitting calibrated refusal when outputs cannot be grounded (Ackermann et al., 19 Sep 2025).
- Embedding symbolic and causal constraints: Incorporating event order, causal relationship graphs, or affordance maps to shape possible continuations (Ackermann et al., 19 Sep 2025).
- Hybrid parametric/non-parametric stacks: Combining parametric neural models with symbolic retrieval or logic-based modules for stronger factual alignment (Alansari et al., 5 Oct 2025).
Engineering for controllability, introspectability, and reliable uncertainty estimation is essential. Training separate hallucination detectors simply transfers the generalization problem unless detectors themselves are equipped with uncertainty awareness calibrated to the model’s knowledge boundaries (Xu, 29 Sep 2025).
7. Evaluation Benchmarks and Open Challenges
A diverse set of task-specific benchmarks now exist for measuring hallucination, with labeled datasets in QA (TruthfulQA, HaluEval, HalluQA), summarization (CNN/DailyMail, XSum, FactCC), dialogue (DialFact, WoW), and multilingual freeform generation (HalOmi, Mu-SHROOM) (Alansari et al., 5 Oct 2025, Cheng et al., 2023, Chataigner et al., 2024, Lee et al., 19 Feb 2025). Key challenges include:
- Universal detection and robust cross-domain generalization for detectors, particularly in low-resource and non-English contexts (Chataigner et al., 2024, Zhang et al., 27 Dec 2025).
- Granularity in hallucination localization, including segment/sentence/claim-level detection (Xiao et al., 2024, Lee et al., 19 Feb 2025).
- Explainability and transparency about the provenance and uncertainty of output claims (Alansari et al., 5 Oct 2025, Pesaranghader et al., 14 Jan 2026).
- Dynamic lifelong learning and continual calibration as LLM deployments encounter new domains and information (Xu, 29 Sep 2025).
Evaluation increasingly incorporates both automatic metrics (AUROC, F1, calibration error) and LLM/human-as-judge protocols for adjudicating factuality at scale. There is convergent emphasis on systemic, root cause–aware pipelines rather than one-off patches or ad hoc filters (Pesaranghader et al., 14 Jan 2026).
References
- (Xu, 29 Sep 2025) Hallucination is Inevitable for LLMs with the Open World Assumption
- (Zhang et al., 27 Dec 2025) Hallucination Detection and Evaluation of LLM
- (Yao et al., 2023) LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples
- (Xu et al., 9 Mar 2025) Delusions of LLMs
- (Alansari et al., 5 Oct 2025) LLMs Hallucination: A Comprehensive Survey
- (Xiao et al., 2024) Detecting and Mitigating Hallucination in Large Vision LLMs via Fine-Grained AI Feedback
- (Ackermann et al., 19 Sep 2025) How LLMs are Designed to Hallucinate
- (Chen et al., 2024) Hallucination Detection: Robustly Discerning Reliable Answers in LLMs
- (Ji et al., 2023) Towards Mitigating Hallucination in LLMs via Self-Reflection
- (McKenna et al., 2023) Sources of Hallucination by LLMs on Inference Tasks
- (Jiang et al., 2024) On LLMs' Hallucination with Regard to Known Facts
- (Cheng et al., 2023) Evaluating Hallucinations in Chinese LLMs
- (Li et al., 25 Aug 2025) Principled Detection of Hallucinations in LLMs via Multiple Testing
- (Lee et al., 19 Feb 2025) REFIND at SemEval-2025 Task 3: Retrieval-Augmented Factuality Hallucination Detection in LLMs
- (Pesaranghader et al., 14 Jan 2026) Hallucination Detection and Mitigation in LLMs
- (Chataigner et al., 2024) Multilingual Hallucination Gaps in LLMs
- (Rawte et al., 2023) The Troubling Emergence of Hallucination in LLMs -- An Extensive Definition, Quantification, and Prescriptive Remediations
- (Yang et al., 20 Feb 2025) Hallucination Detection in LLMs with Metamorphic Relations