SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

Published 22 Jan 2025 in cs.LG, cs.AI, and cs.MA | (2501.13200v1)

Abstract: Multi-agent reinforcement learning (MARL) demonstrates significant progress in solving cooperative and competitive multi-agent problems in various environments. One of the principal challenges in MARL is the need for explicit prediction of the agents' behavior to achieve cooperation. To resolve this issue, we propose the Shared Recurrent Memory Transformer (SRMT) which extends memory transformers to multi-agent settings by pooling and globally broadcasting individual working memories, enabling agents to exchange information implicitly and coordinate their actions. We evaluate SRMT on the Partially Observable Multi-Agent Pathfinding problem in a toy Bottleneck navigation task that requires agents to pass through a narrow corridor and on a POGEMA benchmark set of tasks. In the Bottleneck task, SRMT consistently outperforms a variety of reinforcement learning baselines, especially under sparse rewards, and generalizes effectively to longer corridors than those seen during training. On POGEMA maps, including Mazes, Random, and MovingAI, SRMT is competitive with recent MARL, hybrid, and planning-based algorithms. These results suggest that incorporating shared recurrent memory into the transformer-based architectures can enhance coordination in decentralized multi-agent systems. The source code for training and evaluation is available on GitHub: https://github.com/Aloriosa/srmt.

Abstract PDF Upgrade to Chat

Summary

The paper introduces SRMT, a model that pools agents’ memories to enable decentralized coordination in multi-agent reinforcement learning tasks.
It integrates self-attention and cross-attention modules to fuse individual observations with global context, significantly improving pathfinding performance.
Experimental results show SRMT's superior generalization and scalability in lifelong MAPF, outperforming traditional MARL methods in sparse reward scenarios.

An Analysis of "SRMT: Shared Memory for Multi-agent Lifelong Pathfinding"

Introduction

The paper "SRMT: Shared Memory for Multi-agent Lifelong Pathfinding" (2501.13200) proposes the Shared Recurrent Memory Transformer (SRMT), a novel architecture aimed at improving coordination among agents in multi-agent reinforcement learning (MARL) settings. Unlike traditional methods requiring explicit communication protocols, SRMT facilitates implicit information exchange using shared memory, enhancing agents' ability to collaborate effectively in environments without centralized control.

The study builds upon existing transformer-based memory architectures, extending them into multi-agent settings. Previous approaches in MARL, such as MAMBA, QPLEX, and ATM, rely heavily on explicit communication or centralized training frameworks. In contrast, SRMT enables decentralized coordination by pooling individual agent memories into a globally accessible space, merging insights from cognitive theory with practical MARL applications.

SRMT Architecture

SRMT enhances the memory transformer architecture by integrating a shared memory mechanism, leveraging self-attention and cross-attention modules to process both individual and collective information. At each time step, agents access a pooled memory, which is subsequently updated based on their observations and actions. This setup allows agents to retain context and make informed decisions without explicit inter-agent communication.

Figure 1: Shared Recurrent Memory Transformer architecture. SRMT pools recurrent memories $mem_{i,t}$ of individual agents at a moment $t$ and provides global access to them via cross-attention.

Experimental Evaluation

The SRMT model was evaluated using the POGEMA framework, focusing on both classical and lifelong multi-agent pathfinding (MAPF) tasks. In "Bottleneck" tasks, SRMT outperformed existing MARL and memory-based baselines, notably under sparse reward conditions where feedback was minimal. The architecture's ability to coordinate agent actions was crucial, particularly in environments requiring navigation through narrow corridors.

Figure 2: SRMT effectively solves the Bottleneck Task with different reward functions, showcasing superior performance under challenging conditions.

In lifelong MAPF scenarios, SRMT demonstrated robust generalization to unseen maps, outpacing traditional MARL baselines like MAMBA and QPLEX in terms of throughput and scalability. The incorporation of heuristic planning into SRMT further improved congestion management, implying that hybrid strategies combining learning-based and planning algorithms can significantly enhance performance in dense environments.

Figure 3: SRMT outperforms other MARL methods in different environments, showing robust generalization when evaluated on maps not seen during training.

Implications and Future Work

The SRMT presents a decentralized alternative to MARL challenges, capable of enhancing scalability and robustness in multi-agent systems. By abstracting inter-agent communication into a shared memory construct, SRMT reduces reliance on centralized coordination, facilitating deployment in real-world applications where such control is impractical. Future research might explore the integration of more sophisticated planning algorithms or extend the shared memory concept to additional MARL problem domains.

Conclusion

The introduction of SRMT marks a significant step in the evolution of multi-agent coordination strategies. By leveraging shared memory architectures, SRMT enhances the flexibility and scalability of MARL systems, paving the way for more efficient and adaptive solutions to complex pathfinding tasks. This research underscores the transformative potential of memory-augmented transformer networks in decentralized multi-agent environments, opening new avenues for theoretical exploration and practical application.