Disentangling Retrieval vs. Generation Failures Underlying Unsupported Claims

Disentangle the relative contributions of retrieval failures (citing irrelevant videos) and generation failures (producing unfaithful claims from relevant videos) to unsupported video-grounded outputs generated by the Gemini 2.5 Pro multimodal generative search system, and identify the underlying reasons why certain generated claims depart from the cited source content.

Background

Because the study is a black-box audit of Gemini 2.5 Pro's end-to-end behavior without access to internal mechanisms, it cannot separate errors arising from the retrieval stage from those arising during generation.

Understanding whether unsupported claims primarily result from citing irrelevant sources or from unfaithful synthesis of relevant sources is critical for diagnosing and improving system reliability.

References

This means we cannot determine why certain claims depart from source content, nor disentangle retrieval failures (citing an irrelevant video) from generation failures (producing an unfaithful claim from a relevant video).

Auditing the Reliability of Multimodal Generative Search  (2604.00944 - Sahneh et al., 1 Apr 2026) in Discussion — Limitations