Agent Context Protocols Enhance Collective Inference

Published 20 May 2025 in cs.AI, cs.CL, and cs.LG | (2505.14569v1)

Abstract: AI agents have become increasingly adept at complex tasks such as coding, reasoning, and multimodal understanding. However, building generalist systems requires moving beyond individual agents to collective inference -- a paradigm where multi-agent systems with diverse, task-specialized agents complement one another through structured communication and collaboration. Today, coordination is usually handled with imprecise, ad-hoc natural language, which limits complex interaction and hinders interoperability with domain-specific agents. We introduce Agent context protocols (ACPs): a domain- and agent-agnostic family of structured protocols for agent-agent communication, coordination, and error handling. ACPs combine (i) persistent execution blueprints -- explicit dependency graphs that store intermediate agent outputs -- with (ii) standardized message schemas, enabling robust and fault-tolerant multi-agent collective inference. ACP-powered generalist systems reach state-of-the-art performance: 28.3 % accuracy on AssistantBench for long-horizon web assistance and best-in-class multimodal technical reports, outperforming commercial AI systems in human evaluation. ACPs are highly modular and extensible, allowing practitioners to build top-tier generalist agents quickly.

Abstract PDF Upgrade to Chat

Summary

The paper introduces Agent Context Protocols (ACPs) to standardize multi-agent communication and coordination for scalable, fault-tolerant collective inference.
The methodology decomposes complex tasks into DAG-based sub-tasks with structured message formats, enabling efficient error recovery.
Empirical evaluations demonstrate significant improvements in coordination accuracy and robust performance across benchmark tasks.

Structured Protocols for Multi-Agent Collective Inference

Motivation and Context

Recent advances in LLM-based AI agents have yielded systems proficient in specialized tasks such as coding, complex reasoning, and multimodal data synthesis. However, the construction of robust generalist systems demands seamless collaboration among heterogeneous agents, whose task interdependencies often necessitate sophisticated coordination and fault tolerance—gaps not adequately bridged by ad-hoc natural language communication. The lack of standardized mechanisms for interoperability, error handling, and structured execution in multi-agent settings presents critical barriers to scaling collective inference systems.

Agent Context Protocols: Design and Architecture

To address these challenges, the paper introduces Agent Context Protocols (ACPs), a domain-agnostic, modular schema governing agent-agent communication, coordination, and error resolution. ACPs rest on two primary abstractions: the Execution Blueprint (a DAG encoding sub-task dependencies and status) and structured message formats for inter-agent and agent-tool interactions. Each complex task $T$ is decomposed into atomic sub-tasks $\{\tau_i\}$ , with individual agents $A_i$ assigned operations $\mathcal{O}_i$ according to their capabilities; dependencies in $T$ propagate through the Execution Blueprint, which stores both intermediate states and agent outputs.

Agents interact via protocol-governed message types:

AGENT_REQUEST: Structured data for tool invocation, aggregating LLM-generated and tool-derived inputs with strict validation.
AGENT_RESPONSE: Standardized output specifications with status codes and downstream variables.
ASSISTANCE_REQUEST: Context-rich error reports issued on detection of invalid/missing/incomplete data, leveraging descriptive error codes akin to HTTP semantics.

This architecture provides robust mechanisms for tracking progress, diagnosing faults, and dynamic re-planning, all while adhering to the global Execution Blueprint. The persistency of the blueprint enables efficient error isolation and facilitates parallel/serial task execution by specialized agents.

Figure 1: Overview of the ACP-based system workflow. A complex task is decomposed, executed as a DAG, and coordinated via structured messages and fault-tolerant logic.

Empirical Evaluation

Experiments span three axes: benchmarked web assistance (AssistantBench), multimodal report synthesis, and dashboard creation with control ablations evaluating coordination and fault tolerance.

Web Assistance: AssistantBench Performance

On the AssistantBench benchmark, ACP-powered multi-agent systems achieve 28.3% overall accuracy, surpassing both generalist and specialist agents—even those employing more sophisticated base models. When restricted to minimal toolsets, the ACP system maintains competitive performance (24.8% accuracy), demonstrating that the protocol layer per se supports coherent long-horizon reasoning and extensibility without re-training. ACPs drive robust results across all difficulty levels, with 48.5% accuracy on medium and 15.5% on hard tasks. The ability to seamlessly integrate domain-specific tools—via standardized interfaces—unlocks rapid capability expansion and system adaptation.

Multimodal Report Generation

The architecture enables the synthesis of complex multi-agent outputs in domains including Finance, Technology, Healthcare, Automobile, and Real Estate. Coordinated agent workflows generate highly-structured multimodal reports, integrating textual analysis, data visualizations, and curated citations. Human evaluators rated ACP-based reports consistently highest across all assessed dimensions (Coverage, Presentation Quality, Depth, Clarity) compared to Perplexity and Gemini baselines. Notably, ACP-generated documents sustain high presentation and coverage scores, attributed to robust inter-agent communication and context-preserving execution over lengthy workflows.

(Figure 2)

Figure 2: Sample multimodal report segments generated by ACP-based agents, demonstrating integration of text, visuals, and citations in complex documents.

Dashboard Creation: Coordination and Fault Tolerance Ablation

A synthetic dashboard dataset stratified by complexity was used to quantify the marginal gains from task decomposition and protocol-driven coordination. Comparison across Single Agent, No Assistance (multi-agent, no protocol), and full ACP setups revealed that ACP yields substantial improvements per human evaluation—overall score 3.95 vs. 2.94 (No Assistance) and 1.96 (Single Agent). Level 3 (highest complexity) tasks benefit most from coordinated execution, with error handling and dynamic re-routing mechanisms drastically reducing workflow collapse rates and ensuring output consistency. Structured error codes and assistance requests localize faults and facilitate partial execution, supporting high-yield agent collectives in deep workflows.

(Figure 3)

Figure 3: Execution timeline for a complex travel planning task, highlighting parallel and sequential agent activity, and the ACP-based protocol’s error recovery mechanisms.

ACP advances on prior agent orchestration frameworks (AutoGen, Magentic-One, MetaGPT, etc.) by formalizing inter-agent dialogue and execution via a single extensible protocol, rather than relying solely on natural language or weakly structured operational roles. Drawing inspiration from single-agent context protocols (e.g., Model Context Protocol), ACPs operationalize persistent blueprints and schema-driven interaction for multi-agent collectives. Related work in collaborative robotics, task grammars, and active inference has explored aspects of interoperability and reasoning; ACP unifies these efforts to tackle long-horizon, domain-diverse, and error-prone environments, establishing a scalable substrate for generalist AI systems.

Implications and Future Directions

The introduction of ACPs sets a new standard for scalable, interpretable, and resilient multi-agent systems. Practically, ACPs provide practitioners a highly modular template for rapid prototyping and deployment of agent collectives, with clear paths for domain specialization and capability expansion via plug-and-play tools. Theoretically, ACPs open research into collective intelligence, scaling behaviors under protocol constraints, and compositional generalization. Scalability to larger populations, incorporation of higher-order reasoning agents for global re-planning, and extension to dynamic, non-stationary environments are promising avenues for further refinement.

Conclusion

Agent Context Protocols present a rigorously structured foundation for robust multi-agent communication, coordination, and error management, enabling efficient, fault-tolerant collective inference. Empirical results validate ACPs as a critical enabler for generalist AI, outperforming existing systems on complex benchmarks and generation tasks. This protocol-driven approach streamlines the composition of reliable, extensible agent teams, pushing the field toward interpretable and scalable collaboration in practical AI deployments.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Glossary

off on

Practical Applications

off on

Conceptual Simplification

off on

Glossary

Ablation Study: An experiment that removes or alters components of a system to isolate their impact on performance. "Ablation Study: The Importance of Coordination and Fault Tolerance"
Active Inference Framework: A theoretical framework where agents minimize uncertainty (free energy) to model perception, action, and collective intelligence. "formalize collective intelligence via the Active Inference Framework"
AGENT_REQUEST: A standardized, structured input message schema that an agent uses to prepare and send tool invocations. "Standardized message schemas (AGENT_REQUEST, AGENT_RESPONSE, ASSISTANCE_REQUEST) govern information exchange between agents and tools."
AGENT_RESPONSE: A standardized, structured output message schema that encapsulates results from a tool invocation for downstream use. "Standardized message schemas (AGENT_REQUEST, AGENT_RESPONSE, ASSISTANCE_REQUEST) govern information exchange between agents and tools."
Agent Context Protocols (ACPs): A domain-agnostic set of structured protocols for agent-agent communication, coordination, and error handling to enable robust multi-agent execution. "In this work, we introduce Agent Context Protocols (ACPs), a domain and agent-agnostic set of structured protocols for agent-agent communication, coordination, and error handling."
ASSISTANCE_REQUEST: A protocol message that signals errors or missing information, requesting help or re-planning to proceed. "Standardized message schemas (AGENT_REQUEST, AGENT_RESPONSE, ASSISTANCE_REQUEST) govern information exchange between agents and tools."
AssistantBench: A benchmark of realistic, long-horizon web tasks for evaluating agent browsing, planning, and aggregation capabilities. "AssistantBench \citep{yoran2024assistantbench} is a benchmark designed to evaluate how well AI agents can perform realistic, web-based tasks that require browsing, planning, and aggregating information."
BrowserTool: A tool capability for retrieving up-to-date information from the web for use by agents. "BrowserTool for retrieving up-to-date information from the web"
Collective inference: A paradigm where multiple specialized agents collaborate and communicate to solve complex tasks more effectively than a single agent. "collective inference---a paradigm where multi-agent systems with diverse, task-specialized agents complement each other through communication and collaboration."
Directed acyclic graph (DAG): A graph with directed edges and no cycles, used here to encode data and execution dependencies among sub-tasks. "These sub-tasks have data dependencies forming a directed acyclic graph (DAG)."
Execution Blueprint: The persistent, global DAG of fine-grained tool calls and their dependencies, serving as both plan and memory of intermediate outputs. "Collecting all these fine-grained steps across sub-tasks yields a global DAG, referred to as the Execution Blueprint."
Fault-tolerance agent: A specialized agent that reacts to ASSISTANCE_REQUESTs, updates the plan, and reroutes or marks failures to preserve overall progress. "A specialized fault-tolerance agent then updates $\mathcal{G}$ accordingly."
Final coordination layer: A system layer that aggregates validated outputs into the final deliverable (e.g., formatted answers, reports). "The final coordination layer, specific for AssistantBench, is used to effectively manage and synthesize the outputs in a format expected by AssistantBench."
Linear Temporal Logic (LTL): A formal logic for specifying temporal behaviors and constraints in task planning and coordination. "Fang and Kress-Gazit~\citep{fang2024high} propose a task grammar using Linear Temporal Logic (LTL) to support collaboration among heterogeneous agents"
LLM-based agents: Autonomous components built on LLMs that plan, reason, and invoke tools to perform tasks. "Denote a team of $k$ LLM-based agents by $\mathcal{A} = \{A_1, \dots, A_k\}$ ."
Long-horizon: Describing tasks or workflows that span many sequential steps and require sustained coordination over time. "long-horizon web assistance"
Model Context Protocol (MCP): A specification for structured, context-aware communication between an AI model and external tools/data sources. "For instance, such protocols for single-agent like model context protocol (MCP) \citep{mcp} have enabled context-aware reasoning at scale through seamless communication between AI agents and data sources."
PlotVisualizationTool: A tool capability for generating plots or charts from queried data for inclusion in outputs. "PlotVisualizationTool for generating plots or charts based on queried data."
ReAct: An agent methodology that interleaves reasoning and acting (tool use) within a single agent loop. "A singular ReAct \citep{yao2023react} must aim to resolve the user query, referred to as the Single Agent baseline."
Standardized error codes: Uniform codes used to classify and localize failures during execution, enabling targeted recovery. "Fault tolerance is maintained via standardized error codes, so that sub-task failures or exceptions can be localized and addressed without collapsing the entire workflow."
Standardized message schemas: Predefined structures for agent-tool and inter-agent communication that ensure consistency and interoperability. "Standardized message schemas (AGENT_REQUEST, AGENT_RESPONSE, ASSISTANCE_REQUEST) govern information exchange between agents and tools."
Status codes: Descriptive execution indicators (akin to HTTP) attached to tool responses and errors to guide diagnosis and re-planning. "ACPs introduce standardized descriptive status codes (akin to HTTP \citep{http}) and structured context-rich error messages that work with enhanced reasoner LLMs to re-plan and recover."
TOOL_CALL: The protocol phase where a prepared request is dispatched to an external tool or API for execution. "TOOL_CALL (Execution)."
TOOL_RESPONSE: The structured form of a tool’s raw output, including a status code and extracted fields needed by downstream steps. "structures it into a TOOL_RESPONSE that includes a status code, any relevant output variables, and any values on which subsequent sub-tasks depend."
Topological order: An ordering of DAG nodes such that each node appears after its dependencies, used to schedule sub-tasks safely. "Sub-tasks are then executed in a topological order, ensuring that prerequisites complete before downstream sub-tasks begin."

Agent Context Protocols Enhance Collective Inference

Summary

Structured Protocols for Multi-Agent Collective Inference

Motivation and Context

Agent Context Protocols: Design and Architecture

Empirical Evaluation

Web Assistance: AssistantBench Performance

Multimodal Report Generation

Dashboard Creation: Coordination and Fault Tolerance Ablation

Implications and Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Glossary

Open Problems

Continue Learning

Authors (8)

Collections

Tweets

Agent Context Protocols Enhance Collective Inference

Summary

Structured Protocols for Multi-Agent Collective Inference

Motivation and Context

Agent Context Protocols: Design and Architecture

Empirical Evaluation

Web Assistance: AssistantBench Performance

Multimodal Report Generation

Dashboard Creation: Coordination and Fault Tolerance Ablation

Comparative Perspective and Related Work

Implications and Future Directions

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Glossary

Open Problems

Continue Learning

Related Papers

Authors (8)

Collections

Tweets