A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds
Abstract: Open-ended worlds are those in which there are no pre-specified goals or environmental reward signal. As a consequence, an agent must know how to perform a multitude of tasks. However, when a new task is presented to an agent, we expect it to be able to reuse some of what it knows from previous tasks to rapidly learn that new task. We introduce a novel technique whereby policies for different a priori known tasks are combined into a Mixture-of-Experts model with an attention mechanism across a mix of frozen and unfrozen experts. The model learns when to attend to frozen task-specific experts when appropriate and learns new experts to handle novel situations. We work in an open-ended text-based environment in which the agent is tasked with behaving like different types of character roles and must rapidly learn behaviors associated with new character role types. We show that our agent both obtains more rewards in the zero-shot setting, and discovers these rewards with greater sample efficiency in the few-shot learning settings.
- Lmrl gym: Benchmarks for multi-turn reinforcement learning with language models, 2023.
- Learning dynamic belief graphs to generalize on text-based games. Advances in Neural Information Processing Systems, 33, 2020.
- How to motivate your dragon: Teaching goal-driven agents to speak and act in fantasy worlds. In NAACL-HLT. Association for Computational Linguistics, 2021.
- Graph constrained reinforcement learning for natural language action spaces. In International Conference on Learning Representations, 2019.
- Playing text-adventure games with graph-based deep reinforcement learning. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019.
- Playing text-adventure games with graph-based deep reinforcement learning. arXiv preprint arXiv:1812.01628, 2018.
- How to avoid being eaten by a grue: Structured exploration strategies for textual worlds. arXiv preprint arXiv:2006.07409, 2020.
- Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
- An actor-critic algorithm for sequence prediction. arXiv preprint arXiv:1607.07086, 2016.
- Language models are few-shot learners. Advances in neural information processing systems, 33, 2020.
- Grounding large language models in interactive environments with online reinforcement learning, 2023.
- Policy improvement via imitation of multiple oracles. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 5587–5598. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/3c56fe2f24038c4d22b9eb0aca78f590-Paper.pdf.
- Multi-task reinforcement learning with attention-based mixture of experts. IEEE Robotics and Automation Letters, 8(6):3812–3819, 2023. doi: 10.1109/LRA.2023.3271445.
- Knowledge-enhanced agents for interactive text games. In Proceedings of the 12th Knowledge Capture Conference 2023, pp. 157–165. Association for Computing Machinery, 2023.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, June 2019a. doi: 10.18653/v1/N19-1423.
- Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019b.
- Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks: the Official Journal of the International Neural Network Society, 107:3–11, 2018.
- Discovering generalizable skills via automated generation of diverse tasks. CoRR, abs/2106.13935, 2021. URL https://arxiv.org/abs/2106.13935.
- Interactive fiction games: A colossal adventure. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, 2020.
- Distilling the knowledge in a neural network, 2015.
- Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2021.
- Interactive reinforcement learning from imperfect teachers. HRI ’21 Companion, pp. 577–579, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450382908. doi: 10.1145/3434074.3446361. URL https://doi.org/10.1145/3434074.3446361.
- Ac-teach: A bayesian actor-critic method for policy learning with an ensemble of suboptimal teachers. In Conference on Robot Learning, pp. 717–734. PMLR, 2020.
- Iob: Integrating optimization transfer and behavior transfer for multi-policy reuse, 2023.
- Asynchronous methods for deep reinforcement learning. In International conference on machine learning. PMLR, 2016.
- Enhancing text-based reinforcement learning agents with commonsense knowledge. arXiv preprint arXiv:2005.00811, 2020.
- Language understanding for text-based games using deep reinforcement learning. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015.
- Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark, 2023.
- Actor-mimic: Deep multitask and transfer reinforcement learning, 2016.
- Story shaping: Teaching agents human-like behavior with stories. arXiv preprint arXiv:2301.10107, 2023.
- Adapt: As-needed decomposition and planning with language models, 2023.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1), 2020.
- Policy distillation, 2016.
- A minimal approach for natural language action space in text-based games. In Jing Jiang, David Reitter, and Shumin Deng (eds.), Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL), pp. 138–154, Singapore, December 2023. Association for Computational Linguistics.
- Kickstarting deep reinforcement learning, 2018.
- Equivalence between policy gradients and soft q-learning, 2018.
- ALFWorld: Aligning Text and Embodied Environments for Interactive Learning. In Proceedings of the International Conference on Learning Representations (ICLR), 2021. URL https://arxiv.org/abs/2010.03768.
- Multi-task reinforcement learning with context-based representations, 2021.
- Distral: Robust multitask reinforcement learning. CoRR, abs/1707.04175, 2017. URL http://arxiv.org/abs/1707.04175.
- Learning to speak and act in a fantasy text adventure game. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019.
- Graph attention networks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJXMpikCZ.
- Scienceworld: Is your agent smarter than a 5th grader?, 2022. URL https://arxiv.org/abs/2203.07540.
- Behavior cloned transformers are neurosymbolic reasoners. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pp. 2769–2780, 2023.
- Knowledge transfer for deep reinforcement learning with hierarchical experience replay. AAAI’17, pp. 1640–1646. AAAI Press, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.