CommCP: Efficient Multi-Agent Coordination via LLM-Based Communication with Conformal Prediction

Published 5 Feb 2026 in cs.RO, cs.AI, cs.CV, cs.LG, and cs.MA | (2602.06038v1)

Abstract: To complete assignments provided by humans in natural language, robots must interpret commands, generate and answer relevant questions for scene understanding, and manipulate target objects. Real-world deployments often require multiple heterogeneous robots with different manipulation capabilities to handle different assignments cooperatively. Beyond the need for specialized manipulation skills, effective information gathering is important in completing these assignments. To address this component of the problem, we formalize the information-gathering process in a fully cooperative setting as an underexplored multi-agent multi-task Embodied Question Answering (MM-EQA) problem, which is a novel extension of canonical Embodied Question Answering (EQA), where effective communication is crucial for coordinating efforts without redundancy. To address this problem, we propose CommCP, a novel LLM-based decentralized communication framework designed for MM-EQA. Our framework employs conformal prediction to calibrate the generated messages, thereby minimizing receiver distractions and enhancing communication reliability. To evaluate our framework, we introduce an MM-EQA benchmark featuring diverse, photo-realistic household scenarios with embodied questions. Experimental results demonstrate that CommCP significantly enhances the task success rate and exploration efficiency over baselines. The experiment videos, code, and dataset are available on our project website: https://comm-cp.github.io.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces CommCP, a framework that fuses LLM-based communication with conformal prediction to ensure reliable, calibrated messages in multi-robot cooperation.
It details a modular approach integrating visual-language models and chain-of-thought reasoning to assess object relevance and calibrate inter-agent communication.
Empirical tests on the HM3D benchmark show significant efficiency gains, reducing task completion time and boosting success rates compared to non-communicative baselines.

LLM-Calibrated Communication for Multi-Agent Embodied Task Completion: An Expert Analysis of CommCP

Problem Context and MM-EQA Formalization

CommCP introduces an LLM-based, conformal prediction (CP) calibrated communication framework targeting efficient multi-robot cooperation on the Multi-Agent Multi-Task Embodied Question Answering (MM-EQA) problem (2602.06038). The MM-EQA formulation extends canonical Embodied Question Answering (EQA)—where an embodied agent answers factual or semantic queries by actively exploring an environment—with a scenario involving multiple heterogeneous robots. Each robot is assigned different, non-transferable tasks, but can supplement its own explorations and scene interpretations with peer-generated semantic observations and answers, with the overarching objective of maximizing task success rates and minimizing exploration time.

This task structure situates MM-EQA as a quintessential embodied intelligence challenge, demanding not only efficient spatial exploration and perception-action integration, but also robust, high-relevance inter-agent communication. The inherent unreliability (overconfidence or hallucination) in LLM outputs for natural language communication poses a critical risk—uncalibrated, ambiguous, or misleading messages can degrade joint efficiency and erode collaborative success.

CommCP Framework Architecture

CommCP explicitly addresses the twin issues of message relevance and confidence calibration by fusing LLM-based communications with conformal prediction guarantees. The general system architecture subdivides agent functionality into perception, communication, planning (navigation), and a dedicated confidence check. Each robot, at every time step, fuses visual scene observations (detected objects from VLMs), task prompts, and incoming peer messages.

Figure 2: The architectural overview of CommCP—perception, communication, planning, and confidence modules with CP-based LLM calibration for reliable message exchange in MM-EQA environments.

Key workflow details include:

Object Detection: Each agent deploys a visual-LLM (VLM) to produce a structured representation of observed objects, classifying by type and attribute (e.g., color).
LLM-Based Relevance Reasoning: On receiving object-check requests from other agents, the local LLM evaluates observed–requested object pairs using chain-of-thought prompts, generating categorical relevancy (options A–D; e.g., directly found, highly relevant, irrelevant, or common).
Confidence Calibration via Conformal Prediction: To avoid overconfident or misleading message propagation, robots employ split conformal prediction. This calibrates message elements based on empirical quantiles over a calibration set, providing user-specified constraints on miscoverage (e.g., 95% correctness probability).
Message Generation & Reception: Calibrated messages distill only highly-confident, relevant semantic references, which are then projected onto spatial semantic value (SV) maps and incorporated into the agent navigation policy (frontier-based exploration weighted by SV).
Question Answering & Confidence Check: For both local and shared queries, VLM/LLM outputs undergo a confidence gating process—only answers exceeding a combined (answer × relevance) threshold are accepted and propagated.
Figure 1: Multi-agent coordination through calibrated natural language messaging in a household task environment.

Experimental Benchmarks and Quantitative Analysis

CommCP evaluation leverages a purpose-built MM-EQA benchmark established on the Habitat-Matterport 3D (HM3D) dataset, encompassing a broad spectrum of photorealistic, semantically rich household scenarios. Each scenario is annotated with six EQA-style queries per scene (location, identification, counting, existence, and state), and assigned to a robot team of two or three agents, enabling systematic assessment of cooperative exploration and answering.

Performance metrics comprise:

Success Rate (SR): Proportion of correct answers across all robot-task pairs.
Normalized Time Cost (NTC): Aggregate time (movement + communication) normalized for comparative efficiency analysis.

Strong numerical results are presented:

Efficiency Gains:
- CommCP achieves an SR of 0.68 at NTC 0.4, compared to an SR of 0.65 at NTC 0.8 for the non-communicative baseline (MMFBE), representing a doubling in time efficiency, and a reduction in mean task completion time from 594s (MMFBE) to 445s (CommCP).
Impact of Conformal Prediction:
- The "No-CP" ablation (removing calibration) regresses performance to that of independent explorers, confirming that uncalibrated LLM outputs are either ignored or actively misleading.
Information Quality vs. Quantity:
- Scalability and ablations (controlled message volume, answer-sharing toggles) reinforce that success is a function of message precision, not transmission frequency. Non-selective information sharing dampens benefits, underscoring the centrality of CP-based gating.
Scalability and Latency:
- With scene size growth (e.g., $L \times W \geq 250\,\text{m}^2$ ), CommCP's efficiency advantage widens, indicating robustness to state and communication space complexity.
- Performance remains resilient across messaging latency regimes; speedier message passing primarily accelerates convergence, but final SRs are comparable once sufficient exploration occurs.
- Figure 3: Joint SR–NTC efficiency curves, ablation studies, and scalability analysis across two- and three-robot teams.
- Figure 4: Example visualizations of spatial semantic value maps and agent trajectories, highlighting the guiding effect of calibrated communication on efficient exploration.
- Figure 5: NTC performance delta (Advantage) between CommCP and MMFBE as a function of environment area, confirming enhanced benefits in larger scenes.

Theoretical and Practical Implications

CommCP advances multi-agent embodied intelligence on several fronts:

Theoretical Rigor: The disentanglement of message credibility from language-model outputs via CP introduces statistical reliability to communication within decentralized teams—an element typically ignored in LLM-centric cooperative navigation.
Causal Communication: By operationalizing message exchange through calibrated relevance, information propagation in the multi-agent system becomes meaningfully informative rather than simply verbose, addressing the bandwidth and distraction risks highlighted in prior literature.
Exploratory Efficiency and Robustness: The observed scalability and efficiency improvements are nontrivial for future deployment in heterogeneous real-world settings, particularly in open-domain home service robotics where uncertainty and task interdependence are pervasive.
Framework Generalizability: The modular separation of perception, planning, LLM-based reasoning, and confidence calibration offers extensibility to larger teams and more complex task structures, setting a precedent for CP-integration in multi-agent negotiation, resource allocation, or decentralized RL.

Future Prospects

Potential extensions include:

Scaling to Larger Teams: As agent count and environment complexity increase, efficient protocols for distributed calibration, hierarchical information aggregation, and decentralized CP computations will become crucial.
Integration with Advanced VLMs and LLMs: The demonstrated approach—currently bound to open-source LLMs for probability outputs—can be instantly upgraded via more powerful, potentially fine-tuned models as they become more widely available.
Real-World Deployment: Bridging the sim-to-real gap will require additional robustness against perception errors, dynamic environments, and variable communication topology, but the proven value of confidence calibration is anticipated to transfer.
Advanced Cooperative Behaviors: Incorporating active query generation, reward shaping for information value, and dynamic role assignment can leverage calibrated information flow to new forms of emergent team intelligence.

Conclusion

CommCP establishes a formalisms-backed, statistically calibrated approach to LLM-driven multi-agent cooperation in embodied environments, decisively demonstrating that communicative precision, driven by conformal prediction, is essential for efficiency and scalability. This framework offers a template for rigorously trustworthy inter-agent communication, setting the stage for advances in real-world multi-robot task completion and generalizable approaches to decentralized, LLM-mediated AI systems.

Markdown Report Issue