Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Published 21 May 2016 in cs.AI, cs.LG, and cs.MA | (1605.06676v2)

Abstract: We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. We propose two approaches for learning in these domains: Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL). The former uses deep Q-learning, while the latter exploits the fact that, during learning, agents can backpropagate error derivatives through (noisy) communication channels. Hence, this approach uses centralised learning but decentralised execution. Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains.

Abstract PDF Upgrade to Chat

Citations (1,522)

View on Semantic Scholar

Summary

The paper presents two novel methods, RIAL and DIAL, that empower agents to learn communication protocols in partially observable settings.
It demonstrates that parameter sharing and gradient propagation significantly enhance coordinated performance in tasks like the Switch Riddle and MNIST games.
The work highlights how differentiable communication reduces reliance on trial-and-error, offering insights into language evolution in AI systems.

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Introduction

The paper "Learning to Communicate with Deep Multi-Agent Reinforcement Learning" (1605.06676) addresses the complex problem of enabling multiple agents to sense and act in environments, thereby maximizing shared utility through effective communication protocols. The study introduces two innovative approaches: Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL), which utilize deep neural networks to facilitate end-to-end learning of communication protocols in environments characterized by partial observability.

In the context of deep multi-agent reinforcement learning (MARL), the agents must autonomously develop and agree upon effective communication strategies to solve cooperative tasks. Such tasks impose challenges due to partial observability and the necessity for coordinated communication across agents operating under decentralized execution constraints.

Problem Setting

The paper explores sequential decision-making problems involving multiple agents in partially observable environments. Each agent receives private observations that are only correlated with the underlying Markov state. Communication is facilitated through a discrete limited-bandwidth channel, compelling agents to discover a protocol that sufficiently supports behavior coordination to solve the task.

Centralized learning but decentralized execution is emphasized, making this paradigm relevant for practical applications such as training robotic teams in simulators. The paper also briefly reviews decentralized learning baselines to provide a comprehensive perspective on the MARL setting.

Methods

Reinforced Inter-Agent Learning (RIAL)

RIAL integrates deep Q-learning with a recurrent network to confront partial observability challenges. The method supports independent or shared parameter learning approaches, where the latter reduces the parameter space and speeds up learning significantly. Importantly, execution remains decentralized, maintaining robust performance across diverse observational inputs leading to action differentiation among agents.

Figure 1: RIAL - RL based communication

Differentiable Inter-Agent Learning (DIAL)

DIAL offers enhanced learning opportunities by leveraging centralised learning, allowing gradients to propagate across agents through communication channels. During learning phases, agents are permitted to exchange real-valued messages, subsequently discretizing these exchanges for actual task execution. This gradient-based communication endows agents with richer feedback, reducing reliance on trial-and-error exploration and facilitating the discovery of effective protocols.

This innovative approach makes DIAL uniquely suitable for scenarios requiring communication across complex environments, advocating its inherent deep learning foundation.

Experiments and Results

The paper provides empirical evaluation of RIAL and DIAL across newly proposed environments and challenges, including the Switch Riddle and MNIST Games—with RIAL and DIAL outperforming baseline methods.

Switch Riddle

The Switch Riddle task demonstrates the capability of agents to coordinate under limited communication windows, with parameter sharing proving critical in achieving optimal performance, especially as agent numbers increase.

Figure 2: Evaluation of n=3

MNIST Games

The Color-Digit MNIST and Multi-Step MNIST games further validate DIAL's effectiveness, showcasing its superior ability to optimize message content compared to RIAL. The experiments also underline the role of differentiable communication in addressing stochastic reward functions, emphasizing DIAL's advantage in multi-step information integration.

Figure 3: Evaluation of Multi-Step

Effect of Noise

The analysis reflects on how language evolved into discrete forms, supported by observations that noise in DIAL's communication channels necessitates message discretization during learning phases. This concept provides intriguing insights into language evolution within human and artificial contexts.

Figure 4: Distribution of regularised messages, $P(\hat m | m)$ for different noise levels. Shading indicates $P(\hat m | m) > 0.1$ . Blue bars show a division of the x-range into intervals s.t.\ the resulting y-values have a small probability of overlap, leading to decodable values.

Conclusion

The paper contributes novel environments and methodologies for learning communication protocols, advancing differentiated communication channels under deep MARL paradigms. It emphasizes key factors impacting the learning process, including architecture design, parameter sharing, and gradient propagation across agents.

The study represents a promising step towards unraveling communication and language learning within AI systems, establishing ground for addressing broader communication and compositionality challenges in AI agent coordination.

Figure 5: Final evaluation performance on multi-step MNIST of DIAL normalised by training performance after 50K epochs, under different noise regularisation levels $\sigma \in \{0, 0.5, 1, 1.5, 2\}$