Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies

Published 26 Mar 2024 in cs.CL and cs.CV | (2403.17497v1)

Abstract: In collaborative goal-oriented settings, the participants are not only interested in achieving a successful outcome, but do also implicitly negotiate the effort they put into the interaction (by adapting to each other). In this work, we propose a challenging interactive reference game that requires two players to coordinate on vision and language observations. The learning signal in this game is a score (given after playing) that takes into account the achieved goal and the players' assumed efforts during the interaction. We show that a standard Proximal Policy Optimization (PPO) setup achieves a high success rate when bootstrapped with heuristic partner behaviors that implement insights from the analysis of human-human interactions. And we find that a pairing of neural partners indeed reduces the measured joint effort when playing together repeatedly. However, we observe that in comparison to a reasonable heuristic pairing there is still room for improvement -- which invites further research in the direction of cost-sharing in collaborative interactions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. The hanabi challenge: A new frontier for AI research. Artif. Intell., 280:103216.
  2. Goal-conditioned reinforcement learning with imagined subgoals. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pages 1430–1440. PMLR.
  3. Babyai: A platform to study the sample efficiency of grounded language learning. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
  4. Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks. CoRR, abs/2306.13831.
  5. Herbert H. Clark. 1996. Using Language. ’Using’ Linguistic Books. Cambridge University Press, Cambridge.
  6. Herbert H. Clark and Deanna Wilkes-Gibbs. 1986. Referring as a collaborative process. Cognition, 22(1):1–39. Place: Netherlands Publisher: Elsevier Science.
  7. Reader: Model-based language-instructed reinforcement learning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, pages 16583–16599. Association for Computational Linguistics.
  8. Robert Dale and Ehud Reiter. 1995. Computational interpretations of the gricean maxims in the generation of referring expressions. Cogn. Sci., 19(2):233–263.
  9. Speaker-follower models for vision-and-language navigation. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 3318–3329.
  10. Mechanisms of alignment: shared control, social cognition and metacognition. Philosophical Transactions of the Royal Society B: Biological Sciences, 378(1870):20210362. Publisher: Royal Society.
  11. Solomon W. Golomb. 1996. Polyominoes: Puzzles, Patterns, Problems, and Packings. Princeton University Press.
  12. Martijn Goudbeek and Emiel Krahmer. 2012. Alignment in interactive reference production: Content planning, modifier ordering, and referential overspecification. Topics in Cognitive Science, 4(2):269–289. Place: United Kingdom Publisher: Wiley-Blackwell Publishing Ltd.
  13. Interactive and Cooperative Delivery of Referring Expressions: A Comparison of Three Algorithms. In Proceedings of the 26th Workshop on the Semantics and Pragmatics of Dialogue - Full Papers, Virtually and at Dublin, Ireland. SEMDIAL.
  14. Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput., 9(8):1735–1780.
  15. Reinforcement learning with dual-observation for general video game playing. IEEE Trans. Games, 15(2):202–216.
  16. AI2-THOR: an interactive 3d environment for visual AI. CoRR, abs/1712.05474.
  17. Google research football: A novel reinforcement learning environment. In The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020, pages 4501–4510. AAAI Press.
  18. Gyeong Taek Lee and Kang Jin Kim. 2023. A controllable agent by subgoals in path planning using goal-conditioned reinforcement learning. IEEE Access, 11:33812–33825.
  19. Toward interactive dictation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 15319–15338. Association for Computational Linguistics.
  20. On the pitfalls of measuring emergent communication. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’19, Montreal, QC, Canada, May 13-17, 2019, pages 693–701. International Foundation for Autonomous Agents and Multiagent Systems.
  21. Flatland-rl : Multi-agent reinforcement learning on trains. CoRR, abs/2012.05893.
  22. Igor Mordatch and Pieter Abbeel. 2018. Emergence of grounded compositional language in multi-agent populations. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 1495–1502. AAAI Press.
  23. Mastering emergent language: learning to guide in simulated navigation. CoRR, abs/1908.05135.
  24. Khanh Nguyen and Hal Daumé III. 2019. Help, anna! visual navigation with natural multimodal assistance via retrospective curiosity-encouraging imitation learning. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pages 684–695. Association for Computational Linguistics.
  25. Atsumoto Ohashi and Ryuichiro Higashinaka. 2022. Adaptive natural language generation for task-oriented dialogue via reinforcement learning. In Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, pages 242–252. International Committee on Computational Linguistics.
  26. MATE: benchmarking multi-agent reinforcement learning in distributed target coverage control. In NeurIPS.
  27. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 8024–8035.
  28. Thomas Pechmann. 1989. Incremental speech production and referential overspecification. 27(1):89–110. Publisher: De Gruyter Mouton Section: Linguistics.
  29. Stable-baselines3: Reliable reinforcement learning implementations. J. Mach. Learn. Res., 22:268:1–268:8.
  30. Habitat-matterport 3d dataset (HM3D): 1000 large-scale 3d environments for embodied AI. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual.
  31. Eleanor Rosch and Barbara B. Lloyd, editors. 1978. Cognition and categorization. Cognition and categorization. Lawrence Erlbaum, Oxford, England. Pages: viii, 328.
  32. The starcraft multi-agent challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’19, Montreal, QC, Canada, May 13-17, 2019, pages 2186–2188. International Foundation for Autonomous Agents and Multiagent Systems.
  33. Proximal policy optimization algorithms. CoRR, abs/1707.06347.
  34. Adaplanner: Adaptive planning from feedback with language models. In Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023.
  35. Richard S. Sutton and Andrew G. Barto. 1998. Reinforcement Learning - An Introduction. Adaptive computation and machine learning. MIT Press.
  36. Kees van Deemter. 2016. Computational Models of Referring: A Study in Cognitive Science. The MIT Press.
  37. Learning to simulate natural language feedback for interactive semantic parsing. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 3149–3170. Association for Computational Linguistics.
  38. Pentoref: A corpus of spoken references in task-oriented dialogues. In Proceedings of the Tenth International Conference on Language Resources and Evaluation LREC 2016, Portorož, Slovenia, May 23-28, 2016. European Language Resources Association (ELRA).
  39. Yue Zhang and Parisa Kordjamshidi. 2022. Lovis: Learning orientation and visual signals for vision and language navigation. In Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, pages 5745–5754. International Committee on Computational Linguistics.
Citations (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.