Specify Intended Relational Structure via Text Prompts
Determine how to use text prompts to specify which relational structure a user intends in an image that can embody multiple distinct relational structures, enabling relational visual similarity systems such as the relsim model to unambiguously select the desired relational mapping for retrieval or generation tasks.
References
Last but not least, we acknowledge that one image can embody multiple different relational structures, potentially leading to multiple valid relational mappings. Determining how to use text prompts to specify which relational structure a user intends remains an open question.
— Relational Visual Similarity
(2512.07833 - Nguyen et al., 8 Dec 2025) in Conclusion and Discussion