Distinguishing mirage-based from genuine visual reasoning

Establish reliable, operational criteria and detection methods that can distinguish mirage-based reasoning from genuine image-grounded reasoning in large multimodal models’ explanations and chain-of-thought traces during visual question answering and related multimodal tasks.

Background

The paper shows that frontier multimodal models often produce detailed visual descriptions and correct answers even when no image is provided, a behavior termed mirage reasoning. In many evaluations, the models’ reasoning traces in mirage-mode appear indistinguishable from those generated with real images.

Because the same models’ explanations look visually grounded regardless of image presence, the authors note that the boundary between mirage-based and true visual reasoning cannot be readily discerned from the generated justifications alone, creating a critical evaluation and safety gap.

References

The distinction between mirage-based and visual thinking is unclear

— MIRAGE: The Illusion of Visual Understanding (2603.21687 - Asadi et al., 23 Mar 2026) in Subsection: "The distinction between mirage-based and visual thinking is unclear" (Section: Mirages give the illusion of visual understanding)

Distinguishing mirage-based from genuine visual reasoning

Background

References

Related Problems