Learning Memory-Based Control for Human-Scale Bipedal Locomotion

Published 3 Jun 2020 in cs.RO | (2006.02402v1)

Abstract: Controlling a non-statically stable biped is a difficult problem largely due to the complex hybrid dynamics involved. Recent work has demonstrated the effectiveness of reinforcement learning (RL) for simulation-based training of neural network controllers that successfully transfer to real bipeds. The existing work, however, has primarily used simple memoryless network architectures, even though more sophisticated architectures, such as those including memory, often yield superior performance in other RL domains. In this work, we consider recurrent neural networks (RNNs) for sim-to-real biped locomotion, allowing for policies that learn to use internal memory to model important physical properties. We show that while RNNs are able to significantly outperform memoryless policies in simulation, they do not exhibit superior behavior on the real biped due to overfitting to the simulation physics unless trained using dynamics randomization to prevent overfitting; this leads to consistently better sim-to-real transfer. We also show that RNNs could use their learned memory states to perform online system identification by encoding parameters of the dynamics into memory.

Abstract PDF Upgrade to Chat

Citations (62)

View on Semantic Scholar

Summary

The paper shows that memory-based RNN controllers outperform memoryless architectures in simulation for bipedal locomotion.
It employs reinforcement learning with dynamics randomization, enabling online system identification and enhanced control adaptability.
These results underscore the potential of memory-enabled systems to achieve robust and resilient robotic walking in varying dynamic environments.

Learning Memory-Based Control for Human-Scale Bipedal Locomotion

This paper presents an investigation into the efficacy of recurrent neural networks (RNNs) for learning control policies in sim-to-real transfer of bipedal locomotion, specifically targeting the robot Cassie from Agility Robotics. The research focuses on using internal memory within RNNs to infer important system dynamics which are not directly observable, unlike memoryless architectures, and emphasizes the implementation of dynamics randomization to mitigate overfitting issues.

Key Findings and Contributions

The study establishes several critical insights:

RNN vs. Memoryless Controllers: It is demonstrated that RNN-based policies outperform memoryless architectures considerably in simulation; however, they struggle with real-world application due to overfitting to specific simulation dynamics.
Dynamics Randomization: Introducing dynamics randomization during the training of RNN controllers results in improved transfer to actual hardware. This randomization involves varying simulation parameters to prevent policies from exploiting specific simulation dynamics, thus enhancing robustness.
System Identification: The paper explores the capability of RNNs to perform online system identification, where the network encodes parameters of the dynamics into its internal memory states, enhancing adaptive control under varied conditions.

Methodology

The authors employ reinforcement learning (RL), particularly using Proximal Policy Optimization (PPO), to train the RNN controllers. The task involves policies that manage bipedal walking by receiving various inputs related to robot's state, velocity commands, and a clock input, producing joint position commands. A reward function based on a reference trajectory aids in initial learning stages, while outcomes of the trained policies are rigorously tested in both simulation and real-world environments.

Simulation and Hardware Outcomes

In simulation studies, RNN controllers with dynamics randomization demonstrate superior robustness across varied dynamics, managing longer operational times compared to those without randomization or using memoryless approaches. Simulation testing entails evaluating controllers subjected to 61 randomized parameters. On hardware, RNN controllers trained with dynamics randomization consistently achieve stable walking gaits, whereas those trained without such techniques or feedforward networks exhibit instability.

Training RNNs included simulating diverse dynamics conditions, leveraging this to encode significant information adaptable for non-simulated real-world inconsistencies. The study employs principle component analysis (PCA) to visualize latent states, indicating RNNs' potential to better capture cyclic behaviors essential for bipedal locomotion.

Theoretical Implications and Future Directions

The work contributes to the field by illustrating the potential of memory-enabled neural control policies for complicated dynamic environments, encouraging further exploration of RNN architectures in embodied AI experiencing high dynamical variation. Suggested areas for future research include examining the threshold for disturbances that RNNs must encode rather than accommodate and exploring broader implications of memory-based systems for adaptive learning mechanisms.

Overall, this paper advances current understanding of memory-based control systems and provides empirical evidence supporting the integration of dynamics randomization for sim-to-real transfers, marking a step toward more efficient and resilient robotic locomotion control methodologies.