Anemoi: A Semi-Centralized Multi-agent System Based on Agent-to-Agent Communication MCP server from Coral Protocol
Abstract: Recent advances in generalist multi-agent systems (MAS) have largely followed a context-engineering plus centralized paradigm, where a planner agent coordinates multiple worker agents through unidirectional prompt passing. While effective under strong planner models, this design suffers from two critical limitations: (1) strong dependency on the planner's capability, which leads to degraded performance when a smaller LLM powers the planner; and (2) limited inter-agent communication, where collaboration relies on costly prompt concatenation and context injection, introducing redundancy and information loss. To address these challenges, we propose Anemoi, a semi-centralized MAS built on the Agent-to-Agent (A2A) communication MCP server from Coral Protocol. Unlike traditional designs, Anemoi enables structured and direct inter-agent collaboration, allowing all agents to monitor progress, assess results, identify bottlenecks, and propose refinements in real time. This paradigm reduces reliance on a single planner, supports adaptive plan updates, and minimizes redundant context passing, resulting in more scalable and cost-efficient execution. Evaluated on the GAIA benchmark, Anemoi achieved 52.73% accuracy with a small LLM (GPT-4.1-mini) as the planner, surpassing the strongest open-source baseline OWL (43.63%) by +9.09% under identical LLM settings. Our implementation is publicly available at https://github.com/Coral-Protocol/Anemoi.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Practical Applications
Immediate Applications
- Cost-optimized multi-agent orchestration for knowledge work — Sector: software, enterprise IT — What: Replace centralized, prompt-concatenation orchestrators with Anemoi’s semi-centralized A2A threads to coordinate web search, document processing, and coding agents for tasks like report generation, competitive analysis, and KPI dashboards — Tools/Workflow: Coral Protocol A2A MCP server, Planner (small LLM), Web/Document/Reasoning-Coding workers, Critique + Answer-Finding gating — Dependencies/Assumptions: Access to GPT-4.1-mini (or similar) for planner and stronger worker LLMs; stable toolkits for web automation and file I/O; network reliability for thread latency
- Engineering productivity copilot for triage, reproduction, and patching — Sector: software engineering — What: A2A-coordinated agents to reproduce bugs (web/file/state capture), propose fixes (Reasoning/Coding agent), verify with Critique agent, and package patches — Tools/Workflow: A2A threads bound to issue trackers (Jira/GitHub), CI triggers, unit/regression test scaffolding, consensus check before PR — Dependencies/Assumptions: Secure repo access; tool-use guardrails; deterministic test environments; audit needs addressed via thread logs
- Enterprise RPA with verifiable handoffs — Sector: operations, finance back office — What: Semi-centralized agents automate invoice extraction, reconciliations, spreadsheet transformations, and cross-system updates with critique-verified checkpoints — Tools/Workflow: Document Processing agent (PDF/DOCX), Reasoning/Coding agent (Excel/CSV scripting), Web agent (portal updates), consensus/approval steps for SOX/compliance — Dependencies/Assumptions: OCR/data-extraction quality; structured access to ERPs/CRMs; human-in-the-loop for exceptions
- Customer support co-resolution unit — Sector: customer service — What: Web agent mines KB/forums, Document agent pulls internal SOPs, Reasoning/Coding agent drafts fix scripts/macros, Critique enforces policy/compliance, Answer-Finding finalizes response — Tools/Workflow: CRM integration, A2A threads per ticket, escalation templates, consensus before customer-facing messages — Dependencies/Assumptions: Accurate KB indexing; rate limits on external sites; data-privacy constraints; latency SLAs
- Research/literature assistant with self-critique — Sector: academia, R&D — What: Multi-agent literature review, PDF table extraction, code replication (Reasoning/Coding agent), structured critique and consensus before drafting summaries — Tools/Workflow: A2A thread per research question, citation and evidence trackers, replication notebooks, final synthesis by Answer-Finding agent — Dependencies/Assumptions: Publisher access rights; accurate PDF parsing; compute sandbox for code execution
- Audit-ready workflow logs and provenance — Sector: governance, compliance — What: Use thread compartmentalization to capture who said what, when, and why for audit, reducing prompt-sprawl and information loss — Tools/Workflow: MCP thread archives, mention-based gating, consensus snapshots, immutable log storage — Dependencies/Assumptions: Retention policies; PII redaction; secure storage; regulator-acceptable traceability format
- Data labeling and review with agent consensus — Sector: ML/AI ops — What: Worker agents label examples (text, audio, doc), Critique agent flags uncertainty, consensus voting yields high-confidence labels — Tools/Workflow: Active learning loop, pass@k sampling, uncertain-case escalation, A2A adjudication — Dependencies/Assumptions: Toolchains for multi-modal ingestion; annotation policy encoding; budget for multi-pass labeling
- Education content and assessment generation — Sector: education — What: Web and Document agents source materials, Reasoning/Coding agent builds graded quizzes and auto-checkers, Critique validates correctness and levels, Answer-Finding produces educator-ready packs — Tools/Workflow: LMS connectors, difficulty calibration policies, per-topic A2A threads — Dependencies/Assumptions: Content licensing; bias/age-appropriateness checks; accessibility standards
- Low-cost planner swap for existing agent stacks — Sector: software platforms — What: Retrofit centralized MAS (e.g., OWL-like) to Anemoi’s A2A pattern to keep strong workers but downsize the planner for cost and scalability gains — Tools/Workflow: Adapter layer from CAMEL/LangChain/Assistants to MCP primitives (list_agents, create_thread, send_message, wait_for_mentions) — Dependencies/Assumptions: Compatibility of tool APIs; correct agent discovery; observability to detect planner capability gaps
- Incident response triage co-pilot — Sector: cybersecurity, IT ops — What: Web agent ingests threat intel, Document agent parses logs/reports, Reasoning agent correlates indicators, Critique validates hypotheses, consensus gates containment recommendations — Tools/Workflow: SIEM/SOAR integration, thread-based playbooks, consensus thresholds for auto-actions — Dependencies/Assumptions: Secure data paths; strict role separation; false-positive handling; time-bound latencies
Long-Term Applications
- Cross-organization “Internet of Agents” for supply chains — Sector: manufacturing, logistics — What: Semi-centralized A2A threads across firms to reconcile orders, shipment events, and quality docs; consensus to resolve mismatches — Tools/Workflow: Inter-org MCP federation, identity/trust layers, provenance and payments — Dependencies/Assumptions: Standards for agent identity and auth; legal agreements; robust cross-domain security
- Clinical workflow co-pilot with adaptive planning — Sector: healthcare — What: Agents coordinate prior auth, coding, chart abstraction, and guideline checks with critique-based safety gating and detailed thread logs — Tools/Workflow: EHR connectors, medical coding toolkits, safety policies and clinician-in-the-loop consensus — Dependencies/Assumptions: Regulatory approval (HIPAA/GDPR), rigorous validation, domain-tuned LLMs, fault-tolerant latency
- Legal e-discovery and case assembly — Sector: legal — What: Multi-agent ingestion of large corpora, timeline building, precedent linking, with Critique ensuring citation fidelity and Answer-Finding producing briefs — Tools/Workflow: MCP-driven doc pipelines, chain-of-custody logs, opposing-argument simulation threads — Dependencies/Assumptions: Access to licensed legal databases; high-precision extraction; liability and confidentiality controls
- Financial due diligence and KYC/AML orchestration — Sector: finance — What: Web and Document agents screen news/filings/sanctions, Reasoning agent consolidates risk narratives, Critique tests contradictions; consensus gates onboarding decisions — Tools/Workflow: Core banking/CRM APIs, risk policy codification, explainable thread logs for regulators — Dependencies/Assumptions: Data source reliability; regulator-acceptable audit trails; bias mitigation; strict latency/uptime
- Autonomous software teams for feature delivery — Sector: software — What: Planner decomposes epics, workers design/implement/tests, Critique enforces standards, consensus ships via CI/CD — Tools/Workflow: Agent IDEs, design-review threads, test-generation pipelines, release gating via votes — Dependencies/Assumptions: Stronger reasoning/coding LLMs; robust sandboxing; IP/security policies; human oversight norms
- Multi-robot and cyber-physical coordination — Sector: robotics, manufacturing, logistics — What: Extend A2A to embodied agents with real-time state sharing, bottleneck detection, and plan adaptation — Tools/Workflow: RT messaging bridges to ROS/OPC-UA, safety monitors, failover consensus — Dependencies/Assumptions: Hard real-time constraints; safety certification; reliable localization/sensing; edge compute
- Government digital casework and FOIA processing — Sector: public sector — What: Agents triage requests, extract records, redact PII, and compile responses with critique-based compliance checks — Tools/Workflow: Records systems connectors, redaction toolchains, policy-encoded critique — Dependencies/Assumptions: Statutory compliance; public-records access; auditability; accessibility requirements
- Scientific discovery pipelines — Sector: science/biotech — What: Literature grounding, protocol design, data analysis scripts, and replication/critique loops; integrate with lab automation long-term — Tools/Workflow: A2A experimental planning, notebook generation, provenance tracking, lab-robot APIs — Dependencies/Assumptions: High-accuracy extraction; domain-tuned models; wet-lab integration; IRB/ethics oversight
- Edge/private deployments with small planners — Sector: privacy-first enterprises, defense — What: Run planners on-prem/edge while bursting worker tasks to cloud for heavy lifting, reducing data egress — Tools/Workflow: Hybrid MCP topology, policy-based data routing, latency-aware scheduling — Dependencies/Assumptions: Reliable split-compute orchestration; strict data-classification rules; offline fallbacks
- Standardization and policy for agent interoperability — Sector: policy, standards — What: Advance MCP-like A2A as an interop standard (identity, observability, safety hooks), enabling regulated, multi-vendor agent ecosystems — Tools/Workflow: Compliance profiles, conformance suites, security best practices for agent messaging — Dependencies/Assumptions: Broad vendor buy-in; governance bodies; reference implementations; red-team evaluations
- SOC and fraud operations with layered consensus — Sector: cybersecurity, fintech — What: Multi-agent alert triage, case enrichment, hypothesis testing, and consensus-driven containment or case escalation — Tools/Workflow: Playbook threads, risk-scoring critique modules, action gating with human approval — Dependencies/Assumptions: High-fidelity signals; adversarial robustness; strict false-positive costs; round-trip SLAs
- Safety-aligned, injection-resilient agent networks — Sector: AI safety, platform security — What: Leverage contextual compartmentalization and explicit mentions to reduce prompt injection spread; critique agents for continuous red-teaming — Tools/Workflow: Safety policies embedded in critique, sandboxed tool execution, cross-thread anomaly detection — Dependencies/Assumptions: Mature detection methods; telemetry across agents; standardized incident handling
Notes on feasibility across applications:
- The paper’s demonstrated gains on GAIA with a small planner suggest immediate cost/performance benefits when swapping planners, but reliability still depends on worker tool quality and handling of web-agent latency.
- Thread compartmentalization and explicit agent mentions improve auditability and reduce token overhead now; high-stakes domains will require extensive validation, domain-tuned models, and human oversight.
- Open-source availability of Anemoi and the Coral Protocol MCP server accelerates prototyping; production deployments must address security, privacy, and regulatory compliance from the outset.
Collections
Sign up for free to add this paper to one or more collections.