Agent Context Protocols Enhance Collective Inference
Abstract: AI agents have become increasingly adept at complex tasks such as coding, reasoning, and multimodal understanding. However, building generalist systems requires moving beyond individual agents to collective inference -- a paradigm where multi-agent systems with diverse, task-specialized agents complement one another through structured communication and collaboration. Today, coordination is usually handled with imprecise, ad-hoc natural language, which limits complex interaction and hinders interoperability with domain-specific agents. We introduce Agent context protocols (ACPs): a domain- and agent-agnostic family of structured protocols for agent-agent communication, coordination, and error handling. ACPs combine (i) persistent execution blueprints -- explicit dependency graphs that store intermediate agent outputs -- with (ii) standardized message schemas, enabling robust and fault-tolerant multi-agent collective inference. ACP-powered generalist systems reach state-of-the-art performance: 28.3 % accuracy on AssistantBench for long-horizon web assistance and best-in-class multimodal technical reports, outperforming commercial AI systems in human evaluation. ACPs are highly modular and extensible, allowing practitioners to build top-tier generalist agents quickly.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Glossary
- Ablation Study: An experiment that removes or alters components of a system to isolate their impact on performance. "Ablation Study: The Importance of Coordination and Fault Tolerance"
- Active Inference Framework: A theoretical framework where agents minimize uncertainty (free energy) to model perception, action, and collective intelligence. "formalize collective intelligence via the Active Inference Framework"
- AGENT_REQUEST: A standardized, structured input message schema that an agent uses to prepare and send tool invocations. "Standardized message schemas (AGENT_REQUEST, AGENT_RESPONSE, ASSISTANCE_REQUEST) govern information exchange between agents and tools."
- AGENT_RESPONSE: A standardized, structured output message schema that encapsulates results from a tool invocation for downstream use. "Standardized message schemas (AGENT_REQUEST, AGENT_RESPONSE, ASSISTANCE_REQUEST) govern information exchange between agents and tools."
- Agent Context Protocols (ACPs): A domain-agnostic set of structured protocols for agent-agent communication, coordination, and error handling to enable robust multi-agent execution. "In this work, we introduce Agent Context Protocols (ACPs), a domain and agent-agnostic set of structured protocols for agent-agent communication, coordination, and error handling."
- ASSISTANCE_REQUEST: A protocol message that signals errors or missing information, requesting help or re-planning to proceed. "Standardized message schemas (AGENT_REQUEST, AGENT_RESPONSE, ASSISTANCE_REQUEST) govern information exchange between agents and tools."
- AssistantBench: A benchmark of realistic, long-horizon web tasks for evaluating agent browsing, planning, and aggregation capabilities. "AssistantBench \citep{yoran2024assistantbench} is a benchmark designed to evaluate how well AI agents can perform realistic, web-based tasks that require browsing, planning, and aggregating information."
- BrowserTool: A tool capability for retrieving up-to-date information from the web for use by agents. "BrowserTool for retrieving up-to-date information from the web"
- Collective inference: A paradigm where multiple specialized agents collaborate and communicate to solve complex tasks more effectively than a single agent. "collective inference---a paradigm where multi-agent systems with diverse, task-specialized agents complement each other through communication and collaboration."
- Directed acyclic graph (DAG): A graph with directed edges and no cycles, used here to encode data and execution dependencies among sub-tasks. "These sub-tasks have data dependencies forming a directed acyclic graph (DAG)."
- Execution Blueprint: The persistent, global DAG of fine-grained tool calls and their dependencies, serving as both plan and memory of intermediate outputs. "Collecting all these fine-grained steps across sub-tasks yields a global DAG, referred to as the Execution Blueprint."
- Fault-tolerance agent: A specialized agent that reacts to ASSISTANCE_REQUESTs, updates the plan, and reroutes or marks failures to preserve overall progress. "A specialized fault-tolerance agent then updates accordingly."
- Final coordination layer: A system layer that aggregates validated outputs into the final deliverable (e.g., formatted answers, reports). "The final coordination layer, specific for AssistantBench, is used to effectively manage and synthesize the outputs in a format expected by AssistantBench."
- Linear Temporal Logic (LTL): A formal logic for specifying temporal behaviors and constraints in task planning and coordination. "Fang and Kress-Gazit~\citep{fang2024high} propose a task grammar using Linear Temporal Logic (LTL) to support collaboration among heterogeneous agents"
- LLM-based agents: Autonomous components built on LLMs that plan, reason, and invoke tools to perform tasks. "Denote a team of LLM-based agents by ."
- Long-horizon: Describing tasks or workflows that span many sequential steps and require sustained coordination over time. "long-horizon web assistance"
- Model Context Protocol (MCP): A specification for structured, context-aware communication between an AI model and external tools/data sources. "For instance, such protocols for single-agent like model context protocol (MCP) \citep{mcp} have enabled context-aware reasoning at scale through seamless communication between AI agents and data sources."
- PlotVisualizationTool: A tool capability for generating plots or charts from queried data for inclusion in outputs. "PlotVisualizationTool for generating plots or charts based on queried data."
- ReAct: An agent methodology that interleaves reasoning and acting (tool use) within a single agent loop. "A singular ReAct \citep{yao2023react} must aim to resolve the user query, referred to as the Single Agent baseline."
- Standardized error codes: Uniform codes used to classify and localize failures during execution, enabling targeted recovery. "Fault tolerance is maintained via standardized error codes, so that sub-task failures or exceptions can be localized and addressed without collapsing the entire workflow."
- Standardized message schemas: Predefined structures for agent-tool and inter-agent communication that ensure consistency and interoperability. "Standardized message schemas (AGENT_REQUEST, AGENT_RESPONSE, ASSISTANCE_REQUEST) govern information exchange between agents and tools."
- Status codes: Descriptive execution indicators (akin to HTTP) attached to tool responses and errors to guide diagnosis and re-planning. "ACPs introduce standardized descriptive status codes (akin to HTTP \citep{http}) and structured context-rich error messages that work with enhanced reasoner LLMs to re-plan and recover."
- TOOL_CALL: The protocol phase where a prepared request is dispatched to an external tool or API for execution. "TOOL_CALL (Execution)."
- TOOL_RESPONSE: The structured form of a tool’s raw output, including a status code and extracted fields needed by downstream steps. "structures it into a TOOL_RESPONSE that includes a status code, any relevant output variables, and any values on which subsequent sub-tasks depend."
- Topological order: An ordering of DAG nodes such that each node appears after its dependencies, used to schedule sub-tasks safely. "Sub-tasks are then executed in a topological order, ensuring that prerequisites complete before downstream sub-tasks begin."
Collections
Sign up for free to add this paper to one or more collections.