Identify Relationally Salient Details from a Single Image for Anonymous Captioning
Ascertain which visual details in a single image are irrelevant versus constitutive of the underlying relational pattern when generating an anonymous caption that abstracts the image’s logic.
References
Writing a shared relational attribute from a single image is inherently challenging. For example, given only a sequence depicting a butterfly’s flight stages (Fig.\ref{fig:caption_by_group}, first row), it is unclear which visual details are irrelevant and which constitute the underlying relational pattern.
— Relational Visual Similarity
(2512.07833 - Nguyen et al., 8 Dec 2025) in Relational Visual Similarity — Creating a Relational Dataset — Generating anonymous captions