Imitation Learning from MPC for Quadrupedal Multi-Gait Control

Published 26 Mar 2021 in cs.RO, cs.AI, cs.SY, and eess.SY | (2103.14331v1)

Abstract: We present a learning algorithm for training a single policy that imitates multiple gaits of a walking robot. To achieve this, we use and extend MPC-Net, which is an Imitation Learning approach guided by Model Predictive Control (MPC). The strategy of MPC-Net differs from many other approaches since its objective is to minimize the control Hamiltonian, which derives from the principle of optimality. To represent the policies, we employ a mixture-of-experts network (MEN) and observe that the performance of a policy improves if each expert of a MEN specializes in controlling exactly one mode of a hybrid system, such as a walking robot. We introduce new loss functions for single- and multi-gait policies to achieve this kind of expert selection behavior. Moreover, we benchmark our algorithm against Behavioral Cloning and the original MPC implementation on various rough terrain scenarios. We validate our approach on hardware and show that a single learned policy can replace its teacher to control multiple gaits.

Abstract PDF Upgrade to Chat

Citations (29)

View on Semantic Scholar

Summary

The paper introduces MPC-Net, an imitation learning framework that imitates MPC to enable robust multi-gait control in quadrupedal robots.
It leverages a mixture-of-experts network with novel loss functions to specialize control for distinct gaits, demonstrating improved performance on uneven terrains.
The study validates that the approach achieves real-time adaptability with reduced computational overhead, offering a viable alternative to traditional MPC.

An Expert Overview of "Imitation Learning from MPC for Quadrupedal Multi-Gait Control"

The paper "Imitation Learning from MPC for Quadrupedal Multi-Gait Control" presents a sophisticated approach to the control of quadrupedal robots, emphasizing the seamless integration of multiple gaits through imitation learning (IL) derived from model predictive control (MPC). Authored by researchers from ETH Zürich, this study leverages the nuances of optimal control theories and deep learning methodologies to enhance robotic locomotion capabilities.

The core innovation in this research lies in the adaptation of MPC-Net, an imitation learning approach that generalizes the capability of a locomotion policy to control multiple gaits. The objective is to construct a single neural policy capable of imitating a range of robotic motions, guided by the use of MPC. Unlike many reinforcement learning frameworks where convergence may often lead to a single dominant policy, this methodology aims to retain the versatility of multiple gaits within a unified framework. The paper addresses a common challenge in robotics: achieving real-time control and adaptability in changing environments.

One of the pivotal elements of this research is the utilization of a mixture-of-experts network (MEN) to model the policy architecture. MEN specializes in optimizing discrete operating modes—modes that are inherent in quadrupedal locomotion, such as trot and static walk. This is accomplished through the introduction of novel loss functions tailored to encourage expert specialization within the MEN framework, which optimally partitions control responsibilities among different experts.

Numerically, the results highlight key performance gains achieved through this advanced loss configuration. For instance, the deployment on the ANYmal robot demonstrates improved proficiency in gait control over uneven terrains. Additionally, the paper's extensive benchmarking against behavioral cloning (BC) and the original MPC implementation iterates MPC-Net's capacity to instill robust policies—ones that show substantial resilience under varied operational conditions.

Theoretical implications are bolstered by establishing the Hamiltonian as a critical element in the optimization problem, which ensures that constraints native to the physical dynamics of the robot are factored into policy search. Practically, this makes MPC-Net suitable for applications requiring high-dimensional control with variable policy specifications.

The paper also outlines a comprehensive training pipeline that models the asynchronous nature of data generation and policy training, reinforced by a simulation platform that ensures thorough validation before hardware deployment. The experiments revealed that policies deduced from MPC-Net are not only adaptable but can effectively replace MPC in deployment, achieving real-time control metrics with reduced computational overhead.

Speculating on future implications, the developments in this research open pathways for more adaptable robotic systems, where multiple behaviors can be seamlessly toggled in real-time. This research could meaningfully influence the progression of autonomous systems across various terrains and tasks, potentially extending beyond the field of quadrupedal robots to other robotic configurations and applications.

In conclusion, "Imitation Learning from MPC for Quadrupedal Multi-Gait Control" presents a technically rigorous and practically significant contribution to robotic control strategies. While the paper provides substantial evidence of its methods' efficacy, further exploration into scaling these strategies for more complex gaits and environments could transition this novel approach from experimental validation to broader, real-world robotic applications.