Interpreting the causes of hallucinations in large language models

Develop interpretability methods and causal diagnostics that explain why large language models hallucinate, including identifying query- and model-specific mechanisms that lead to hallucinations in retrieval-augmented generation systems used for legal research.

Background

Despite improvements from retrieval-augmented generation, the study documents persistent hallucinations, including misinterpretation of case holdings, misuse of authority, and fabrication. These varied failure modes suggest deeper explanatory gaps about when and why models produce incorrect or misleading outputs.

Because proprietary systems reveal limited technical details and mix multiple components (retrieval, filtering, generation), pinpointing the causal mechanisms behind hallucinations remains difficult. The authors introduce a typology of RAG-related errors to aid analysis, but explicitly note that interpreting why models hallucinate is still an open problem.

References

Interpreting why an LLM hallucinates is an open problem.

Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools  (2405.20362 - Magesh et al., 2024) in Section 6.4 (A Typology of Legal RAG Errors)