Replicating centralized MAPF solvers with MAPF-GPT

Determine how effectively MAPF-GPT, a decentralized imitation-learning policy for multi-agent pathfinding, can replicate the behavior of centralized MAPF solvers other than LaCAM, specifically the optimal Conflict-Based Search (CBS) algorithm.

Background

MAPF-GPT is trained via supervised imitation learning on a large dataset of expert demonstrations generated by the centralized LaCAM solver, producing a decentralized policy that uses only local observations without communication. The model achieves strong zero-shot performance across multiple benchmarks.

The authors note that, while effective when trained on LaCAM trajectories, it remains unclear whether MAPF-GPT can similarly imitate the behavior of other centralized MAPF solvers, such as Conflict-Based Search (CBS), which provides optimal solutions. Clarifying this would assess the generality of MAPF-GPT with respect to different expert policies and would guide data collection and training strategies.

References

It is also unclear how effectively MAPF-GPT can replicate the behavior of the other existing centralized approaches (such as CBS that is an optimal MAPF solver). This dependence on the type of behavioral expert policy requires further research.

MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale  (2409.00134 - Andreychuk et al., 2024) in Appendix: Limitations