Papers
Topics
Authors
Recent
Search
2000 character limit reached

Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

Published 25 Oct 2024 in cs.AI | (2410.19450v1)

Abstract: Offline-to-Online Reinforcement Learning has emerged as a powerful paradigm, leveraging offline data for initialization and online fine-tuning to enhance both sample efficiency and performance. However, most existing research has focused on single-agent settings, with limited exploration of the multi-agent extension, i.e., Offline-to-Online Multi-Agent Reinforcement Learning (O2O MARL). In O2O MARL, two critical challenges become more prominent as the number of agents increases: (i) the risk of unlearning pre-trained Q-values due to distributional shifts during the transition from offline-to-online phases, and (ii) the difficulty of efficient exploration in the large joint state-action space. To tackle these challenges, we propose a novel O2O MARL framework called Offline Value Function Memory with Sequential Exploration (OVMSE). First, we introduce the Offline Value Function Memory (OVM) mechanism to compute target Q-values, preserving knowledge gained during offline training, ensuring smoother transitions, and enabling efficient fine-tuning. Second, we propose a decentralized Sequential Exploration (SE) strategy tailored for O2O MARL, which effectively utilizes the pre-trained offline policy for exploration, thereby significantly reducing the joint state-action space to be explored. Extensive experiments on the StarCraft Multi-Agent Challenge (SMAC) demonstrate that OVMSE significantly outperforms existing baselines, achieving superior sample efficiency and overall performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Emergent Tool Use From Multi-Agent Autocurricula. In International Conference on Learning Representations. https://openreview.net/forum?id=SkxpxJBKwS
  2. Efficient Online Reinforcement Learning with Offline Data. In Proceedings of the 40th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 1577–1594. https://proceedings.mlr.press/v202/ball23a.html
  3. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680 (2019).
  4. Dimitri Bertsekas. 2021. Multiagent Reinforcement Learning: Rollout and Policy Iteration. IEEE/CAA Journal of Automatica Sinica 8, 2 (2021), 249–272. https://doi.org/10.1109/JAS.2021.1003814
  5. Decision Transformer: Reinforcement Learning via Sequence Modeling. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 15084–15097. https://proceedings.neurips.cc/paper_files/paper/2021/file/7f489f642a0ddb10272b5c31057f0663-Paper.pdf
  6. Bi-DexHands: Towards Human-Level Bimanual Dexterous Manipulation. IEEE Transactions on Pattern Analysis and Machine Intelligence 46, 5 (2024), 2804–2818. https://doi.org/10.1109/TPAMI.2023.3339515
  7. Off-the-Grid MARL: Datasets and Baselines for Offline Multi-Agent Reinforcement Learning. In Extended Abstract at the 2023 International Conference on Autonomous Agents and Multiagent Systems. AAMAS.
  8. D4RL: Datasets for Deep Data-Driven Reinforcement Learning. arXiv:2004.07219 [cs.LG]
  9. Imitation Bootstrapped Reinforcement Learning. In Proceedings of Robotics: Science and Systems. Delft, Netherlands. https://doi.org/10.15607/RSS.2024.XX.056
  10. Bayesian Design Principles for Offline-to-Online Reinforcement Learning. In Forty-first International Conference on Machine Learning. https://openreview.net/forum?id=HLHQxMydFk
  11. Rethinking the Implementation Tricks and Monotonicity Constraint in Cooperative Multi-agent Reinforcement Learning. In ICLR Blogposts 2023 (May 1, 2023). https://iclr-blogposts.github.io/2023/blog/2023/riit/ https://iclr-blogposts.github.io/2023/blog/2023/riit/.
  12. Jiechuan Jiang and Zongqing Lu. 2023. Online Tuning for Offline Decentralized Multi-Agent Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence 37, 7 (Jun. 2023), 8050–8059. https://doi.org/10.1609/aaai.v37i7.25973
  13. Conservative Q-Learning for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1179–1191. https://proceedings.neurips.cc/paper_files/paper/2020/file/0d2b2061826a5df3221116a5085a6052-Paper.pdf
  14. Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble. In 5th Annual Conference on Robot Learning. https://openreview.net/forum?id=AlJXhEI6J5W
  15. Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=tbFBh3LMKi
  16. Beyond Conservatism: Diffusion Policies in Offline Multi-agent Reinforcement Learning. arXiv:2307.01472 [cs.AI] https://arxiv.org/abs/2307.01472
  17. From motor control to team play in simulated humanoid football. Science Robotics 7, 69 (2022), eabo0235. https://doi.org/10.1126/scirobotics.abo0235 arXiv:https://www.science.org/doi/pdf/10.1126/scirobotics.abo0235
  18. Offline pre-trained multi-agent decision transformer. Machine Intelligence Research 20, 2 (2023), 233–248.
  19. Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning. In Advances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 62244–62269. https://proceedings.neurips.cc/paper_files/paper/2023/file/c44a04289beaf0a7d968a94066a1d696-Paper-Conference.pdf
  20. Frans A. Oliehoek and Christopher Amato. 2016. A Concise Introduction to Decentralized POMDPs (1st ed.). Springer Publishing Company, Incorporated.
  21. Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification. In Proceedings of the 39th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 162), Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 17221–17237. https://proceedings.mlr.press/v162/pan22a.html
  22. Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. Journal of Machine Learning Research 21, 178 (2020), 1–51. http://jmlr.org/papers/v21/20-081.html
  23. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, 4295–4304. https://proceedings.mlr.press/v80/rashid18a.html
  24. Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning. In Advances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 77290–77312. https://proceedings.neurips.cc/paper_files/paper/2023/file/f3f2ff9579ba6deeb89caa2fe1f0b99c-Paper-Conference.pdf
  25. Multi-agent reinforcement learning for resource allocation in large-scale robotic warehouse sortation centers. In IEEE CDC 2023. https://www.amazon.science/publications/multi-agent-reinforcement-learning-for-resource-allocation-in-large-scale-robotic-warehouse-sortation-centers
  26. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International conference on machine learning. PMLR, 5887–5896.
  27. Offline Multi-Agent Reinforcement Learning with Knowledge Distillation. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 226–237. https://proceedings.neurips.cc/paper_files/paper/2022/file/01d78b294d80491fecddea897cf03642-Paper-Conference.pdf
  28. Grandmaster level in StarCraft II using multi-agent reinforcement learning. nature 575, 7782 (2019), 350–354.
  29. Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning. In Advances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 47081–47104. https://proceedings.neurips.cc/paper_files/paper/2023/file/9318763d049edf9a1f2779b2a59911d3-Paper-Conference.pdf
  30. Order Matters: Agent-by-agent Policy Optimization. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=Q-neeWNVv1
  31. Believe What You See: Implicit Constraint Approach for Offline Multi-Agent Reinforcement Learning. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 10299–10312. https://proceedings.neurips.cc/paper_files/paper/2021/file/550a141f12de6341fba65b0ad0433500-Paper.pdf
  32. Policy Expansion for Bridging Offline-to-Online Reinforcement Learning. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=-Y34L45JR6z
  33. A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning. In Proceedings of the AAAI Conference on Artificial Intelligence.
  34. ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, Kate Larson (Ed.). International Joint Conferences on Artificial Intelligence Organization, 5563–5571. https://doi.org/10.24963/ijcai.2024/615 Main Track.
  35. MADiff: Offline Multi-agent Learning with Diffusion Models. arXiv preprint arXiv:2305.17330 (2023).

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.