Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies
Abstract: In collaborative goal-oriented settings, the participants are not only interested in achieving a successful outcome, but do also implicitly negotiate the effort they put into the interaction (by adapting to each other). In this work, we propose a challenging interactive reference game that requires two players to coordinate on vision and language observations. The learning signal in this game is a score (given after playing) that takes into account the achieved goal and the players' assumed efforts during the interaction. We show that a standard Proximal Policy Optimization (PPO) setup achieves a high success rate when bootstrapped with heuristic partner behaviors that implement insights from the analysis of human-human interactions. And we find that a pairing of neural partners indeed reduces the measured joint effort when playing together repeatedly. However, we observe that in comparison to a reasonable heuristic pairing there is still room for improvement -- which invites further research in the direction of cost-sharing in collaborative interactions.
- The hanabi challenge: A new frontier for AI research. Artif. Intell., 280:103216.
- Goal-conditioned reinforcement learning with imagined subgoals. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 1430–1440. PMLR.
- Babyai: A platform to study the sample efficiency of grounded language learning. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
- Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks. CoRR, abs/2306.13831.
- Herbert H. Clark. 1996. Using Language. ’Using’ Linguistic Books. Cambridge University Press, Cambridge.
- Herbert H. Clark and Deanna Wilkes-Gibbs. 1986. Referring as a collaborative process. Cognition, 22(1):1–39. Place: Netherlands Publisher: Elsevier Science.
- Reader: Model-based language-instructed reinforcement learning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 16583–16599. Association for Computational Linguistics.
- Robert Dale and Ehud Reiter. 1995. Computational interpretations of the gricean maxims in the generation of referring expressions. Cogn. Sci., 19(2):233–263.
- Speaker-follower models for vision-and-language navigation. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 3318–3329.
- Mechanisms of alignment: shared control, social cognition and metacognition. Philosophical Transactions of the Royal Society B: Biological Sciences, 378(1870):20210362. Publisher: Royal Society.
- Solomon W. Golomb. 1996. Polyominoes: Puzzles, Patterns, Problems, and Packings. Princeton University Press.
- Martijn Goudbeek and Emiel Krahmer. 2012. Alignment in interactive reference production: Content planning, modifier ordering, and referential overspecification. Topics in Cognitive Science, 4(2):269–289. Place: United Kingdom Publisher: Wiley-Blackwell Publishing Ltd.
- Interactive and Cooperative Delivery of Referring Expressions: A Comparison of Three Algorithms. In Proceedings of the 26th Workshop on the Semantics and Pragmatics of Dialogue - Full Papers, Virtually and at Dublin, Ireland. SEMDIAL.
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput., 9(8):1735–1780.
- Reinforcement learning with dual-observation for general video game playing. IEEE Trans. Games, 15(2):202–216.
- AI2-THOR: an interactive 3d environment for visual AI. CoRR, abs/1712.05474.
- Google research football: A novel reinforcement learning environment. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 4501–4510. AAAI Press.
- Gyeong Taek Lee and Kang Jin Kim. 2023. A controllable agent by subgoals in path planning using goal-conditioned reinforcement learning. IEEE Access, 11:33812–33825.
- Toward interactive dictation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 15319–15338. Association for Computational Linguistics.
- On the pitfalls of measuring emergent communication. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’19, Montreal, QC, Canada, May 13-17, 2019, pages 693–701. International Foundation for Autonomous Agents and Multiagent Systems.
- Flatland-rl : Multi-agent reinforcement learning on trains. CoRR, abs/2012.05893.
- Igor Mordatch and Pieter Abbeel. 2018. Emergence of grounded compositional language in multi-agent populations. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 1495–1502. AAAI Press.
- Mastering emergent language: learning to guide in simulated navigation. CoRR, abs/1908.05135.
- Khanh Nguyen and Hal Daumé III. 2019. Help, anna! visual navigation with natural multimodal assistance via retrospective curiosity-encouraging imitation learning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 684–695. Association for Computational Linguistics.
- Atsumoto Ohashi and Ryuichiro Higashinaka. 2022. Adaptive natural language generation for task-oriented dialogue via reinforcement learning. In Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, pages 242–252. International Committee on Computational Linguistics.
- MATE: benchmarking multi-agent reinforcement learning in distributed target coverage control. In NeurIPS.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 8024–8035.
- Thomas Pechmann. 1989. Incremental speech production and referential overspecification. 27(1):89–110. Publisher: De Gruyter Mouton Section: Linguistics.
- Stable-baselines3: Reliable reinforcement learning implementations. J. Mach. Learn. Res., 22:268:1–268:8.
- Habitat-matterport 3d dataset (HM3D): 1000 large-scale 3d environments for embodied AI. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual.
- Eleanor Rosch and Barbara B. Lloyd, editors. 1978. Cognition and categorization. Cognition and categorization. Lawrence Erlbaum, Oxford, England. Pages: viii, 328.
- The starcraft multi-agent challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’19, Montreal, QC, Canada, May 13-17, 2019, pages 2186–2188. International Foundation for Autonomous Agents and Multiagent Systems.
- Proximal policy optimization algorithms. CoRR, abs/1707.06347.
- Adaplanner: Adaptive planning from feedback with language models. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023.
- Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning - An Introduction. Adaptive computation and machine learning. MIT Press.
- Kees van Deemter. 2016. Computational Models of Referring: A Study in Cognitive Science. The MIT Press.
- Learning to simulate natural language feedback for interactive semantic parsing. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 3149–3170. Association for Computational Linguistics.
- Pentoref: A corpus of spoken references in task-oriented dialogues. In Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portorož, Slovenia, May 23-28, 2016. European Language Resources Association (ELRA).
- Yue Zhang and Parisa Kordjamshidi. 2022. Lovis: Learning orientation and visual signals for vision and language navigation. In Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, pages 5745–5754. International Committee on Computational Linguistics.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.