Hallucination Rates in Real-World PA Letter Generation

Ascertain the hallucination rate of large language model-generated prior authorization letters under less structured, real-world deployment conditions where input records may be incomplete or ambiguous, and quantify the associated safety risks for clinical AI deployment.

Background

The study found zero detected clinical hallucinations across 135 letters generated by GPT-4o, Claude Sonnet 4.5, and Gemini 2.5 Pro, but emphasized that this result occurred in a constrained setting where all relevant clinical facts were explicitly provided in the prompt.

The authors caution that real-world deployments will often involve incomplete or ambiguous records, which may increase factual error risks. They therefore identify the need to measure hallucination rates under less structured conditions as a key unresolved issue for safe clinical use.

References

The hallucination rate under less structured conditions remains an important open question for the safety of clinical AI deployment.

AI-Generated Prior Authorization Letters: Strong Clinical Content, Weak Administrative Scaffolding  (2603.29366 - Awan et al., 31 Mar 2026) in Hallucination Assessment subsection, Results section