Multi-Agent Coordination via Multi-Level Communication

Published 26 Sep 2022 in cs.MA and cs.LG | (2209.12713v2)

Abstract: The partial observability and stochasticity in multi-agent settings can be mitigated by accessing more information about others via communication. However, the coordination problem still exists since agents cannot communicate actual actions with each other at the same time due to the circular dependencies. In this paper, we propose a novel multi-level communication scheme, Sequential Communication (SeqComm). SeqComm treats agents asynchronously (the upper-level agents make decisions before the lower-level ones) and has two communication phases. In the negotiation phase, agents determine the priority of decision-making by communicating hidden states of observations and comparing the value of intention, obtained by modeling the environment dynamics. In the launching phase, the upper-level agents take the lead in making decisions and then communicate their actions with the lower-level agents. Theoretically, we prove the policies learned by SeqComm are guaranteed to improve monotonically and converge. Empirically, we show that SeqComm outperforms existing methods in various cooperative multi-agent tasks.

Abstract PDF Upgrade to Chat

Citations (1)

View on Semantic Scholar

Summary

The paper introduces SeqComm, a two-phase asynchronous communication scheme in MARL that uses dynamic priority setting based on predicted intentions.
It employs a negotiation phase where agents share hidden states to forecast trajectories, reducing conflicts from synchronous communication.
Empirical evaluations in MPE and SMAC show SeqComm outperforms traditional methods with higher rewards and stable convergence.

Multi-Agent Coordination via Multi-Level Communication

The paper "Multi-Agent Coordination via Multi-Level Communication" (2209.12713) presents SeqComm, a novel communication scheme for multi-agent reinforcement learning (MARL) that facilitates coordination via a two-phase asynchronous communication protocol. SeqComm is designed to address the limitations of synchronous communication, which often leads to circular dependencies preventing optimal coordination. This asynchronous scheme is based on dynamically assigning decision-making priorities, leveraging the concept of intention generation and communication.

Sequential Communication Design

SeqComm divides communication into two distinct phases: negotiation and launching.

Negotiation Phase

In the negotiation phase, agents communicate hidden states derived from their current observations. These states help generate predicted future trajectories called "intentions" using a shared world model. By modeling environmental dynamics and anticipating the behavior of other agents based on the negotiated priority, agents can calculate intention values that influence decision-making order.

Agents repeatedly broadcast these intention values among themselves to establish the priority sequence, ensuring agents with the most promising intent make initial moves. This unique ordering scheme helps resolve potential conflicts arising from relative overgeneralization in synchronous communication models.

Figure 1: Overview of SeqComm. Sequential communication involves negotiation, where agents share hidden states, and launching, where actions are executed based on precedence.

Launching Phase

Once priority is set, agents in upper-level positions broadcast their actions to lower-level agents. This explicit coordination process ensures actions are informed by prior decisions, thereby reducing uncertainty and improving cooperative strategy formulation. Actions are executed simultaneously despite sequential decision-making, ensuring no communication latency affects real-time performance.

Figure 2: Illustration of learned priority of decision-making in PP (upper panel) and CN (lower panel). Preys (landmarks) are viewed in black and predators (agents) are viewed in grey in PP (CN). From a to e, shown is the priority order.

Theoretical Guarantees

SeqComm offers two main theoretical guarantees:

Monotonic Improvement: Each agent’s policy updates are shown to ensure monotonic improvement of the overall joint policy, regardless of the priority sequence. This is proven using a new decentralized partially observable Markov decision process (Dec-POMDP) model, incorporating agent-by-agent updates facilitated by sequential decision-making.
Convergence: The strategy’s asynchronous priority resolution mechanism prevents the learning process from stalling at local optima typical in synchronous systems. Relative overgeneralization problems are effectively circumvented, ensuring stable convergence.

Empirical Evaluation

The empirical evaluations of SeqComm involve testing in diverse cooperative multi-agent environments such as the Multi-Agent Particle Environment (MPE) and StarCraft Multi-Agent Challenge (SMAC). SeqComm consistently outperformed other MARL communication methods by achieving higher mean rewards and win rates.

MPE and SMAC Experiments

In the MPE experiments, SeqComm showed superior coordination in tasks like predator-prey and cooperative navigation. The results indicated that asynchronously determined priorities allowed agents to form more effective collaborative strategies, which mitigated collisions and optimized resource allocation.

In SMAC, reduced observation ranges emphasized the necessity for efficient communication, where SeqComm excelled by ensuring clarity and precision in action coordination.

Figure 3: Learning curves in terms of the win rate of SeqComm and baselines on four customized SMAC maps.

Ablation Studies

The paper includes comprehensive ablation studies to examine the significance of dynamic priority setting and reduced communication range in SeqComm:

Dynamic Priority Setting: Compared with fixed or random priority models, SeqComm demonstrated substantial performance improvements, revealing the necessity of flexible order adjustments based on real-time intention evaluations.
Reduced Communication Range: SeqComm continued to perform well even with reduced communication, as agents could still achieve effective coordination, highlighting communication efficiency and robustness.

Figure 4: Ablation studies on reduced communication range in PP and CN.

Conclusion

SeqComm introduces an innovative multi-level communication framework for MARL that facilitates superior agent coordination through asynchronous decisions. By decoupling action planning and execution phases, SeqComm not only addresses inherent coordination challenges in MARL but enhances adaptability across varying environments. Theoretical and empirical validations provide strong support for SeqComm’s efficacy and applicability in real-world multi-agent systems. Future work could explore applications with non-stationary environments and expand scope with larger agent groups.