Generalize COVR-based game to multi-party, multi-image conversations
Generalize the MT-PingEval instantiation of the COVR benchmark from two-player interactions with exactly two images to multi-party conversations involving larger numbers of images, specifying how multiple participants should communicate about their private images to collaboratively compute the answer.
References
We use data from the COVR benchmark, restricting ourselves to the examples with exactly two images; the generalization to multi-party conversations about larger numbers of images is left for future work.
— MT-PingEval: Evaluating Multi-Turn Collaboration with Private Information Games
(2602.24188 - Eisenstein et al., 27 Feb 2026) in Section 3.2 (COVR)