Scaling Team Coordination on Graphs with Reinforcement Learning
Abstract: This paper studies Reinforcement Learning (RL) techniques to enable team coordination behaviors in graph environments with support actions among teammates to reduce the costs of traversing certain risky edges in a centralized manner. While classical approaches can solve this non-standard multi-agent path planning problem by converting the original Environment Graph (EG) into a Joint State Graph (JSG) to implicitly incorporate the support actions, those methods do not scale well to large graphs and teams. To address this curse of dimensionality, we propose to use RL to enable agents to learn such graph traversal and teammate supporting behaviors in a data-driven manner. Specifically, through a new formulation of the team coordination on graphs with risky edges problem into Markov Decision Processes (MDPs) with a novel state and action space, we investigate how RL can solve it in two paradigms: First, we use RL for a team of agents to learn how to coordinate and reach the goal with minimal cost on a single EG. We show that RL efficiently solves problems with up to 20/4 or 25/3 nodes/agents, using a fraction of the time needed for JSG to solve such complex problems; Second, we learn a general RL policy for any $N$-node EGs to produce efficient supporting behaviors. We present extensive experiments and compare our RL approaches against their classical counterparts.
- M. Limbu, Z. Hu, S. Oughourli, X. Wang, X. Xiao, and D. Shishika, “Team coordination on graphs with state-dependent edge cost,” in 20230 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, 2023.
- A. Dorri, S. S. Kanhere, and R. Jurdak, “Multi-agent systems: A survey,” Ieee Access, vol. 6, pp. 28573–28593, 2018.
- J. Bellingham, “Autonomous ocean sampling network,” Moss Landing, CA: Monterey Bay Aquarium Research Institute, 2006.
- X. Xiao, J. Dufek, and R. R. Murphy, “Autonomous visual assistance for robot operations using a tethered uav,” in Field and Service Robotics: Results of the 12th International Conference, pp. 15–29, Springer, 2021.
- X. Xiao, J. Dufek, T. Woodbury, and R. Murphy, “Uav assisted usv visual navigation for marine mass casualty incident response,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6105–6110, IEEE, 2017.
- X. Xiao, J. Dufek, and R. Murphy, “Visual servoing for teleoperation using a tethered uav,” in 2017 IEEE International Symposium on Safety, Security and Rescue Robotics (SSRR), pp. 147–152, IEEE, 2017.
- B. Liu, X. Xiao, and P. Stone, “Team orienteering coverage planning with uncertain reward,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9728–9733, IEEE, 2021.
- A. Khamis, A. Hussein, and A. Elmogy, “Multi-robot task allocation: A review of the state-of-the-art,” Cooperative robots and sensor networks 2015, pp. 31–51, 2015.
- W. Ren and R. W. Beard, “Consensus seeking in multiagent systems under dynamically changing interaction topologies,” IEEE Transactions on Automatic Control, vol. 50, no. 5, pp. 655–661, 2005.
- R. Olfati-Saber, “Flocking for multi-agent dynamic systems: Algorithms and theory,” IEEE Transactions on Automatic Control, vol. 51, no. 3, pp. 401–420, 2006.
- X. Wang, Y. Zhou, and W. Jin, “D3g: Learning multi-robot coordination from demonstrations,” arXiv preprint arXiv:2207.08892, 2022.
- J. Hart, R. Mirsky, X. Xiao, S. Tejeda, B. Mahajan, J. Goo, K. Baldauf, S. Owen, and P. Stone, “Using human-inspired signals to disambiguate navigational intentions,” in Social Robotics: 12th International Conference, ICSR 2020, Golden, CO, USA, November 14–18, 2020, Proceedings, pp. 320–331, Springer, 2020.
- M. Khonji, R. Alyassi, W. Merkt, A. Karapetyan, X. Huang, S. Hong, J. Dias, and B. Williams, “Multi-agent chance-constrained stochastic shortest path with application to risk-aware intelligent intersection,” arXiv preprint arXiv:2210.01766, 2022.
- K.-K. Oh, M.-C. Park, and H.-S. Ahn, “A survey of multi-agent formation control,” Automatica, vol. 53, pp. 424–440, 2015.
- Y. Zheng, Y. Zhu, and L. Wang, “Consensus of heterogeneous multi-agent systems,” IET Control Theory & Applications, vol. 5, no. 16, pp. 1881–1888, 2011.
- MIT Press, 2023.
- T. Rashid, M. Samvelyan, C. S. de Witt, G. Farquhar, J. Foerster, and S. Whiteson, “Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning,” 2018.
- P. Sunehag, G. Lever, A. Gruslys, W. M. Czarnecki, V. Zambaldi, M. Jaderberg, M. Lanctot, N. Sonnerat, J. Z. Leibo, K. Tuyls, and T. Graepel, “Value-decomposition networks for cooperative multi-agent learning,” 2017.
- C. Yu, A. Velu, E. Vinitsky, J. Gao, Y. Wang, A. Bayen, and Y. Wu, “The surprising effectiveness of ppo in cooperative, multi-agent games,” 2022.
- R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” 2020.
- T. Chu, J. Wang, L. Codecà, and Z. Li, “Multi-agent deep reinforcement learning for large-scale traffic signal control,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 3, pp. 1086–1095, 2019.
- C. J. C. H. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, pp. 279–292, May 1992.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
- S. Li, J. K. Gupta, P. Morales, R. Allen, and M. J. Kochenderfer, “Deep implicit coordination graphs for multi-agent reinforcement learning,” 2021.
- W. Böhmer, V. Kurin, and S. Whiteson, “Deep coordination graphs,” 2020.
- C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, pp. 279–292, 1992.
- S. Huang and S. Ontañ ón, “A closer look at invalid action masking in policy gradient algorithms,” The International FLAIRS Conference Proceedings, vol. 35, may 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.