Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks

Published 24 Jun 2023 in cs.LG | (2306.13831v1)

Abstract: We present the Minigrid and Miniworld libraries which provide a suite of goal-oriented 2D and 3D environments. The libraries were explicitly created with a minimalistic design paradigm to allow users to rapidly develop new environments for a wide range of research-specific needs. As a result, both have received widescale adoption by the RL community, facilitating research in a wide range of areas. In this paper, we outline the design philosophy, environment details, and their world generation API. We also showcase the additional capabilities brought by the unified API between Minigrid and Miniworld through case studies on transfer learning (for both RL agents and humans) between the different observation spaces. The source code of Minigrid and Miniworld can be found at https://github.com/Farama-Foundation/{Minigrid, Miniworld} along with their documentation at https://{minigrid, miniworld}.farama.org/.

Abstract PDF HTML Upgrade to Chat

References (29)

Citations (128)

View on Semantic Scholar

Summary

The paper presents Minigrid and Miniworld libraries as modular and customizable RL environments designed for diverse goal-oriented tasks.
It demonstrates a unified API and extensibility, facilitating integration with frameworks like Stable-Baselines3 for rapid prototyping.
Case studies reveal effective transfer learning between 2D and 3D environments, enhancing both RL agent performance and human decision-making.

Modular and Customizable RL Environments for Goal-Oriented Tasks

The paper presents Minigrid and Miniworld libraries designed to facilitate reinforcement learning (RL) research through modular and goal-oriented environments. These libraries enable users to construct a diverse set of environments with ease, providing a flexible platform for developing and testing RL algorithms. Both libraries have been widely adopted by the RL community, emphasizing their impact on fostering innovative research methodologies.

Design and Capabilities of Minigrid and Miniworld

The primary objective of the Minigrid and Miniworld libraries is to provide a minimalistic design philosophy, thereby ensuring ease of use and configurability. Both libraries focus on supporting instruction-following, navigation-based, and goal-oriented tasks.

Minigrid: A 2D GridWorld environment where each tile can be occupied with various objects, facilitating diverse mission configurations. The design prioritizes deterministic settings, which simplifies the task-oriented control strategies for RL agents.
Figure 1: Example environments from Minigrid and Miniworld.
Miniworld: A 3D environment composed of connected rooms filled with objects, offering a complex but familiar structure where RL agents can operate. It provides an agent-perspective observation model that enhances strategic RL learning trajectories in three-dimensional spaces.

Both environments incorporate partially observable Markov Decision Processes (POMDP) principles, allowing RL agents to interact within a rich set of controllable parameters like state transitions and reward modifications.

Unified API and Extensibility

The unified API provided by the libraries allows easy creation of custom environments and integration into existing RL frameworks such as Stable-Baselines3 (SB3). The design maintains a balance between reducing the complexity of environment setup and offering robust configurability to suit advanced research needs. Users can quickly prototype and extend environments, supporting novel research applications without in-depth knowledge of underlying software architectures.

Case Studies on Transfer Learning

RL Agent Transfer Across Observation Spaces

A case study demonstrates how model weights can be transferred between the Minigrid and Miniworld environments, focusing on how to effectively reuse learned policies. By retaining critical learned features—such as the mission encoder and critic network—RL performance can be enhanced for new tasks in Miniworld that share core similarities with Minigrid tasks.

Figure 2: Visualization of the miniworld-gotoobj-env (left) and minigrid-gotoobj-env (right).

Human Transfer Learning

Human cognitive performance was analyzed in similar environments between Minigrid and Miniworld, highlighting the significance of environment familiarity in enhancing decision efficiency. Trajectories show consistent improvement in task completion time when users acquire experience from a simpler grid layout before transitioning to three-dimensional spaces.

Figure 3: Trajectories from one human subject when testing transferring experience on Minigrid environments to Miniworld.

Minigrid and Miniworld fill a niche by focusing explicitly on goal-oriented tasks in both 2D and 3D spaces, compared to other RL simulation platforms that emphasize single-task scenarios. The paper addresses the need for flexible and generalized environments capable of adapting to diverse RL challenges, setting the groundwork for subsequent advancements in instructional and navigational learning contexts within artificial intelligence.

Conclusion

The Minigrid and Miniworld libraries present a well-defined paradigm for creating flexible, scalable, and extensible environments critical for modern RL research. By offering unified APIs and a streamlined setup process, these libraries significantly lower the barrier to entry for researchers aiming to apply advanced RL models in various task settings. Future improvements focus on enhancing real-world applicability and expanding the scope of instructional tasks RSAituationcomplexities these environments can handle. Limitations include computational constraints inherent to Python-based implementations and potential divergences from real-world dynamics impacting policy transfer efficacy.