Generalize COVR-based game to multi-party, multi-image conversations

Generalize the MT-PingEval instantiation of the COVR benchmark from two-player interactions with exactly two images to multi-party conversations involving larger numbers of images, specifying how multiple participants should communicate about their private images to collaboratively compute the answer.

Background

MT-PingEval adapts the COVR multimodal reasoning benchmark to a two-player private-information game by showing one image to each player and requiring them to converse to answer the query. In this work, the authors restrict evaluation to instances with exactly two images.

They explicitly note that extending this setup to multi-party conversations with larger numbers of images remains to be done. Such an extension would test interactive coordination among more than two agents and increase the complexity of private information sharing in collaborative reasoning.

References

We use data from the COVR benchmark, restricting ourselves to the examples with exactly two images; the generalization to multi-party conversations about larger numbers of images is left for future work.

MT-PingEval: Evaluating Multi-Turn Collaboration with Private Information Games  (2602.24188 - Eisenstein et al., 27 Feb 2026) in Section 3.2 (COVR)