Rapid Locomotion via Reinforcement Learning

Published 5 May 2022 in cs.RO, cs.AI, and cs.LG | (2205.02824v1)

Abstract: Agile maneuvers such as sprinting and high-speed turning in the wild are challenging for legged robots. We present an end-to-end learned controller that achieves record agility for the MIT Mini Cheetah, sustaining speeds up to 3.9 m/s. This system runs and turns fast on natural terrains like grass, ice, and gravel and responds robustly to disturbances. Our controller is a neural network trained in simulation via reinforcement learning and transferred to the real world. The two key components are (i) an adaptive curriculum on velocity commands and (ii) an online system identification strategy for sim-to-real transfer leveraged from prior work. Videos of the robot's behaviors are available at: https://agility.csail.mit.edu/

Abstract PDF Upgrade to Chat

Citations (178)

View on Semantic Scholar

Summary

The paper demonstrates an RL-trained controller that achieves a record top speed of 3.9 m/s for agile maneuvers on the MIT Mini Cheetah.
It employs adaptive curriculum learning and online system identification to bridge the sim-to-real gap across diverse terrains.
Experimental results highlight robust agility and rapid recovery from disturbances on grass, ice, and gravel surfaces.

Rapid Locomotion via Reinforcement Learning

The paper "Rapid Locomotion via Reinforcement Learning" presents an end-to-end approach for training a neural network controller that achieves record agility in legged robots, specifically the MIT Mini Cheetah. This system is based on reinforcement learning (RL) and is shown to be capable of robust high-speed maneuvers on various types of terrain, including grass, ice, and gravel. The methodology outlined in the paper emphasizes two primary components: adaptive curriculum learning and online system identification for sim-to-real transfer.

Key Features of the Approach

This research showcases a system that achieves a sustained top speed of $3.9 \, \text{m/s}$ for the MIT Mini Cheetah on flat ground, which is a remarkable performance benchmark for this robot platform. The core of the system is a neural network policy trained using RL in simulation and applied directly in reality, without additional fine-tuning, thanks to effective sim-to-real transfer techniques.

Adaptive Curriculum Learning: To effectively train the neural network in simulation, an adaptive curriculum on velocity commands was utilized. This curriculum gradually expands the difficulty and range of velocity commands during training, facilitating successful learning over a broad spectrum of task complexities.
Online System Identification: For transferring the learned policy from simulation to reality, an online system identification strategy was employed. This methodology allows the robot to adapt its learned behaviors to variations in real-world conditions, such as changes in terrain characteristics.

Experimental Results and Observations

The results are demonstrative of the system's capabilities across diverse environments and scenarios. These include achieving agile rapid turning and sprinting on challenging terrains and exhibiting robust responses to unforeseen disturbances such as slips and hardware failures. Notably, the policy's robustness is realized despite training exclusively on flat terrain.

The research additionally explores the sim-to-real gap, demonstrating that the online system identification component effectively mitigates discrepancies between simulated and real-world executions. The paper provides strong quantitative findings, reporting speeds and agility metrics that validate their approach compared to existing model-predictive control systems.

Implications and Future Directions

Practically, the research expands the horizon for deploying low-cost quadruped robots in complex environments without extensive custom engineering. Theoretically, the methodology accentuates the potential for RL to solve intricate control problems with minimal sensor input and human intervention in modeling.

Future research directions proposed include extending the system's capabilities to include vision-based inputs and applying the methodology to a broader array of locomotion tasks beyond mere velocity tracking. Furthermore, optimizing additional objectives, such as energy efficiency alongside agility, represents a promising avenue for refining learned policies to better align with specific operational needs.

In summary, this paper synthesizes advances in reinforcement learning to deliver a high-performing robotic control system, marking a significant stride in autonomous robotics and machine learning.