Confirm whether certain OpenAI search-augmented models generate ungrounded citation URLs

Ascertain whether the OpenAI search-augmented models gpt-4.1, gpt-4.1-mini, gpt-4o-search-preview, and gpt-4o-mini-search-preview generate citation URLs that are not grounded in actual retrieval results, as suggested by their equality of non-resolving and hallucinated URL rates indicating fabrication rather than link rot.

Background

The authors find that four OpenAI search-augmented models show non-resolving URL rates equal to hallucinated URL rates, implying zero stale URLs and suggesting that all broken citations are fabricated. This pattern is consistent with URL generation ungrounded in actual retrieval.

Because the internal retrieval pipelines are proprietary, the authors cannot confirm whether the models emit URLs that were never retrieved. Establishing the grounding behavior of these models would clarify whether their citation process is decoupled from retrieval, a key factor in preventing fabricated URLs.

References

This pattern is consistent with URL generation that is not grounded in actual retrieval results, though we cannot confirm this without access to the internal retrieval pipeline.

Detecting and Correcting Reference Hallucinations in Commercial LLMs and Deep Research Agents  (2604.03173 - Rao et al., 3 Apr 2026) in Section 4.4 (RQ4: What fraction of citation failures is fabrication versus link rot?)