Don't lie to your friends: Learning what you know from collaborative self-play

Published 18 Mar 2025 in cs.LG and cs.CL | (2503.14481v2)

Abstract: To be helpful assistants, AI agents must be aware of their own capabilities and limitations. This includes knowing when to answer from parametric knowledge versus using tools, when to trust tool outputs, and when to abstain or hedge. Such capabilities are hard to teach through supervised fine-tuning because they require constructing examples that reflect the agent's specific capabilities. We therefore propose a radically new approach to teaching agents what they know: \emph{collaborative self-play}. We construct multi-agent collaborations in which the group is rewarded for collectively arriving at correct answers. The desired meta-knowledge emerges from the incentives built into the structure of the interaction. We focus on small societies of agents that have access to heterogeneous tools (corpus-specific retrieval), and therefore must collaborate to maximize their success while minimizing their effort. Experiments show that group-level rewards for multi-agent communities can induce policies that \emph{transfer} to improve tool use and selective prediction in settings where individual agents are deployed in isolation.

Abstract PDF Upgrade to Chat

Summary

Collaborative Self-Play for Calibrated LLM Agents

The paper presents a novel approach to enhance the capabilities of AI agents through a learning paradigm known as collaborative self-play (CSP). It addresses critical issues in conversational AI, particularly focusing on an agent's ability to accurately gauge its own knowledge, trustworthiness of external tools, and the necessity of abstention or uncertainty expression. These skills are often inadequately addressed through conventional supervised learning, which typically requires static examples reflective of an agent’s capabilities. Instead, CSP introduces dynamic multi-agent interactions where agents collectively strive for a successful outcome, rewarding calibrated confidence and efficient tool use.

Core Concept

At the heart of this research lies the integration of collaborative self-play, a mechanism where agents engage in a multi-agent environment to achieve collective goals rather than individual outputs. Within this framework, agents form small societies, each endowed with distinct tools (pertinent to corpus-specific retrieval), and are incentivized to collaborate effectively to maximize success while minimizing unnecessary effort.

Experimental Setup

The paper utilizes two specific datasets: BioASQ and PopQA. These benchmarks include factoid question answering tasks, which mirror real-world scenarios where agents must decide between relying on parametric knowledge or external retrieval. Through CSP, agents undergo training via Reinforced Self-Training (ReST), an iterative process that fine-tunes agents on the most successful rollouts from these multi-agent interactions.

Experimental Results

Results indicate significant advances in calibrated decision-making and selective tool use compared to classical in-context learning (ICL).

Task Performance: CSP agents demonstrated superior performance in terms of F1 scores, especially when their internal or retrieval knowledge was complementary. The success is evidenced by higher F1 scores in mismatched retrieval environments—where ICL struggled due to misleading data—validating the robustness of CSP.
Effort Reduction: CSP-trained agents significantly reduced the frequency of unnecessary search queries, outperforming ICL by achieving similar or higher accuracy rates through fewer tool calls.
Answer and Search Calibration: Agents learned when to search and when to rely on parametric knowledge, optimizing their response strategy. CSP exhibited improved calibration in P(SEARCH) by strategically leveraging retrieval only when likely to add informational value.

Game-Theoretic Analysis

The paper also provides a theoretical underpinning using a simplified two-player game model. This model captures the incentives for players to calibrate their responses effectively, driving CSP's approach to encourage truthful communication and efficient tool usage.

Implications and Future Directions

The research suggests promising avenues for CSP, especially in expanding AI’s collaboration skills—essential for real-world applications. There is potential for CSP to be adapted to other AI training contexts, including agent specialization and adaptive user preferences. Moreover, the framework opens discussions about unsupervised learning dynamics and self-play in low-resource settings.

In conclusion, this paper presents a compelling exploration of CSP as a mechanism for fostering linguistic capabilities in AI agents, surpassing traditional methodologies by promoting organic learning through interaction and collaboration. Its success in achieving calibrated agent behaviors marks a significant step towards more reliable AI systems capable of sophisticated decision-making in varied environments.

Markdown Report Issue