Papers
Topics
Authors
Recent
Search
2000 character limit reached

Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks

Published 24 Jun 2023 in cs.LG | (2306.13831v1)

Abstract: We present the Minigrid and Miniworld libraries which provide a suite of goal-oriented 2D and 3D environments. The libraries were explicitly created with a minimalistic design paradigm to allow users to rapidly develop new environments for a wide range of research-specific needs. As a result, both have received widescale adoption by the RL community, facilitating research in a wide range of areas. In this paper, we outline the design philosophy, environment details, and their world generation API. We also showcase the additional capabilities brought by the unified API between Minigrid and Miniworld through case studies on transfer learning (for both RL agents and humans) between the different observation spaces. The source code of Minigrid and Miniworld can be found at https://github.com/Farama-Foundation/{Minigrid, Miniworld} along with their documentation at https://{minigrid, miniworld}.farama.org/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. C. Bamford. Griddly: A platform for AI research in games. Software Impacts, 8:100066, 2021.
  2. DeepMind Lab. CoRR, abs/1612.03801, 2016.
  3. OpenAI Gym. CoRR, abs/1606.01540, 2016.
  4. BabyAI: A platform to study the sample efficiency of grounded language learning. In Proceedings of International Conference on Learning Representations, New Orleans, LA, May 2019.
  5. Emergent complexity and zero-shot transfer via unsupervised environment design. In Proceedings of Advances in Neural Information Processing Systems 33, Virtual, December 2020.
  6. Relay Policy Learning: Solving long-horizon tasks via imitation and reinforcement learning. In Proceedings of the Conference on Robot Learning, Virtual, pages 1025–1037, October 2020.
  7. R. L. Gutierrez and M. Leonetti. Information-theoretic task selection for meta-reinforcement learning. In Proceedings of Advances in Neural Information Processing Systems 33, Virtual, December 2020.
  8. Pre-trained word embeddings for goal-conditional transfer learning in reinforcement learning. CoRR, abs/2007.05196, 2020.
  9. Generalization in reinforcement learning with selective noise injection and information bottleneck. In Proceedings of Advances in Neural Information Processing Systems 32, Vancouver, Canada, pages 13956–13968, December 2019.
  10. Partially observable markov decision processes for artificial intelligence. In I. Wachsmuth, C.-R. Rollinger, and W. Brauer, editors, KI-95: Advances in Artificial Intelligence, pages 1–17, Berlin, Heidelberg, 1995. Springer Berlin Heidelberg. ISBN 978-3-540-44944-7.
  11. ViZDoom: A Doom-based AI research platform for visual reinforcement learning. In Proceedings of IEEE Conference on Computational Intelligence and Games, Santorini, Greece, pages 1–8. IEEE, September 2016.
  12. Decoupling exploration and exploitation for meta-reinforcement learning without sacrifices. In Proceedings of the 38th International Conference on Machine Learning, Virtual Event, volume 139, pages 6925–6935. PMLR, July 2021.
  13. Isaac Gym: High performance GPU based physics simulation for robot learning. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, Virtual, December 2021.
  14. How to stay curious while avoiding noisy tvs using aleatoric uncertainty estimation. In Proceedings of International Conference on Machine Learning, Baltimore, MD, volume 162, pages 15220–15240. PMLR, July 2022.
  15. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
  16. Evolving curricula with regret-based environment design. In Proceedings of International Conference on Machine Learning, Baltimore, MD, volume 162, pages 17473–17498. PMLR, July 2022.
  17. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of Advances in Neural Information Processing Systems 32, Vancouver, Canada, pages 8024–8035, December 2019.
  18. Real-world robot learning with masked visual pre-training. In Proceedings of the Conference on Robot Learning, Auckland, New Zealand, pages 416–426, 2022.
  19. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22:268:1–268:8, 2021.
  20. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017.
  21. State entropy maximization with random encoders for efficient exploration. In Proceedings of the 38th International Conference on Machine Learning, Virtual Event, volume 139, pages 9443–9454. PMLR, July 2021.
  22. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
  23. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. CoRR, abs/1712.01815, 2017.
  24. Mazebase: A sandbox for learning from games. CoRR, abs/1511.07401, 2015.
  25. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.
  26. MuJoCo: A physics engine for model-based control. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura, Portugal, pages 5026–5033, October 2012.
  27. dm_control: Software and tasks for continuous control. Software Impacts, 6:100022, 2020.
  28. Safe policy optimization with local generalized linear function approximations. In Proceedings of Advances in Neural Information Processing Systems 34, Virtual, pages 20759–20771, December 2021.
  29. Noveld: A simple yet effective exploration criterion. In Proceedings of Advances in Neural Information Processing Systems 34, Virtual, pages 25217–25230, December 2021.
Citations (128)

Summary

  • The paper presents Minigrid and Miniworld libraries as modular and customizable RL environments designed for diverse goal-oriented tasks.
  • It demonstrates a unified API and extensibility, facilitating integration with frameworks like Stable-Baselines3 for rapid prototyping.
  • Case studies reveal effective transfer learning between 2D and 3D environments, enhancing both RL agent performance and human decision-making.

Modular and Customizable RL Environments for Goal-Oriented Tasks

The paper presents Minigrid and Miniworld libraries designed to facilitate reinforcement learning (RL) research through modular and goal-oriented environments. These libraries enable users to construct a diverse set of environments with ease, providing a flexible platform for developing and testing RL algorithms. Both libraries have been widely adopted by the RL community, emphasizing their impact on fostering innovative research methodologies.

Design and Capabilities of Minigrid and Miniworld

The primary objective of the Minigrid and Miniworld libraries is to provide a minimalistic design philosophy, thereby ensuring ease of use and configurability. Both libraries focus on supporting instruction-following, navigation-based, and goal-oriented tasks.

  • Minigrid: A 2D GridWorld environment where each tile can be occupied with various objects, facilitating diverse mission configurations. The design prioritizes deterministic settings, which simplifies the task-oriented control strategies for RL agents. Figure 1

    Figure 1: Example environments from Minigrid and Miniworld.

  • Miniworld: A 3D environment composed of connected rooms filled with objects, offering a complex but familiar structure where RL agents can operate. It provides an agent-perspective observation model that enhances strategic RL learning trajectories in three-dimensional spaces.

Both environments incorporate partially observable Markov Decision Processes (POMDP) principles, allowing RL agents to interact within a rich set of controllable parameters like state transitions and reward modifications.

Unified API and Extensibility

The unified API provided by the libraries allows easy creation of custom environments and integration into existing RL frameworks such as Stable-Baselines3 (SB3). The design maintains a balance between reducing the complexity of environment setup and offering robust configurability to suit advanced research needs. Users can quickly prototype and extend environments, supporting novel research applications without in-depth knowledge of underlying software architectures.

Case Studies on Transfer Learning

RL Agent Transfer Across Observation Spaces

A case study demonstrates how model weights can be transferred between the Minigrid and Miniworld environments, focusing on how to effectively reuse learned policies. By retaining critical learned features—such as the mission encoder and critic network—RL performance can be enhanced for new tasks in Miniworld that share core similarities with Minigrid tasks. Figure 2

Figure 2: Visualization of the miniworld-gotoobj-env (left) and minigrid-gotoobj-env (right).

Human Transfer Learning

Human cognitive performance was analyzed in similar environments between Minigrid and Miniworld, highlighting the significance of environment familiarity in enhancing decision efficiency. Trajectories show consistent improvement in task completion time when users acquire experience from a simpler grid layout before transitioning to three-dimensional spaces. Figure 3

Figure 3: Trajectories from one human subject when testing transferring experience on Minigrid environments to Miniworld.

Minigrid and Miniworld fill a niche by focusing explicitly on goal-oriented tasks in both 2D and 3D spaces, compared to other RL simulation platforms that emphasize single-task scenarios. The paper addresses the need for flexible and generalized environments capable of adapting to diverse RL challenges, setting the groundwork for subsequent advancements in instructional and navigational learning contexts within artificial intelligence.

Conclusion

The Minigrid and Miniworld libraries present a well-defined paradigm for creating flexible, scalable, and extensible environments critical for modern RL research. By offering unified APIs and a streamlined setup process, these libraries significantly lower the barrier to entry for researchers aiming to apply advanced RL models in various task settings. Future improvements focus on enhancing real-world applicability and expanding the scope of instructional tasks RSAituationcomplexities these environments can handle. Limitations include computational constraints inherent to Python-based implementations and potential divergences from real-world dynamics impacting policy transfer efficacy.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.