GRoQ-LoCO: Generalist and Robot-agnostic Quadruped Locomotion Control using Offline Datasets

Published 16 May 2025 in cs.RO, cs.AI, and cs.LG | (2505.10973v3)

Abstract: Recent advancements in large-scale offline training have demonstrated the potential of generalist policy learning for complex robotic tasks. However, applying these principles to legged locomotion remains a challenge due to continuous dynamics and the need for real-time adaptation across diverse terrains and robot morphologies. In this work, we propose GRoQ-LoCO, a scalable, attention-based framework that learns a single generalist locomotion policy across multiple quadruped robots and terrains, relying solely on offline datasets. Our approach leverages expert demonstrations from two distinct locomotion behaviors - stair traversal (non-periodic gaits) and flat terrain traversal (periodic gaits) - collected across multiple quadruped robots, to train a generalist model that enables behavior fusion. Crucially, our framework operates solely on proprioceptive data from all robots without incorporating any robot-specific encodings. The policy is directly deployable on an Intel i7 nuc, producing low-latency control outputs without any test-time optimization. Our extensive experiments demonstrate zero-shot transfer across highly diverse quadruped robots and terrains, including hardware deployment on the Unitree Go1, a commercially available 12kg robot. Notably, we evaluate challenging cross-robot training setups where different locomotion skills are unevenly distributed across robots, yet observe successful transfer of both flat walking and stair traversal behaviors to all robots at test time. We also show preliminary walking on Stoch 5, a 70kg quadruped, on flat and outdoor terrains without requiring any fine tuning. These results demonstrate the potential of offline, data-driven learning to generalize locomotion across diverse quadruped morphologies and behaviors.

Abstract PDF Upgrade to Chat

Summary

Overview of GROQLoco: Generalist and Robot-agnostic Quadruped Locomotion Control

The paper "GROQLoco: Generalist and Robot-agnostic Quadruped Locomotion Control using Offline Datasets" introduces a novel methodology for achieving generalist locomotion control across various quadruped robots and terrains using an offline dataset for training. This work deviates from conventional reinforcement learning paradigms by utilizing scalable offline learning strategies, reminiscent of those applied in robotic manipulation but less explored in legged locomotion contexts.

The central innovation of GROQLoco is its ability to learn a single locomotion policy applicable to different quadruped robots without necessitating robot-specific adjustments or post-training optimization. The policy formulation is composed of a modular architecture leveraging attention mechanisms to process proprioceptive inputs and robustly manage both periodic and non-periodic gaits.

Methodology

The framework is built upon a behavior cloning paradigm, where expert demonstrations from simulation environments serve as the training dataset. The dataset encompasses diverse locomotion tasks, collected from multiple quadruped designs, embodying various terrains such as flat paths and stair traversal. These tasks ensure a wide-ranging dataset critical for promoting model generalization.

GROQLoco employs a sequential processing pipeline consisting of observation encoders, a GRU-based temporal model, and attention layers, culminating in an MLP to predict locomotive actions. Attention mechanisms, both over the observation and GRU history, allow the model to capture temporal dependencies effectively. The introduction of an adaptive loss function that emphasizes specific actions based on their variance further refines the learning process.

Experimental Validation

The experimental results underscore the capability of GROQLoco in achieving zero-shot transferability across distinct embodiments and terrains. When evaluated using quadrupeds of different morphologies such as the Unitree Go1 and Stoch 5, the GROQLoco policy exhibited successful transfer of locomotion skills without fine-tuning. The policy effectively managed to climb stairs and navigate previously unseen terrains, indicating strong generalization facilitated by diverse input data during training.

Remarkably, this framework demonstrated scenarios where zero-shot robots performed comparably, if not superior, to those explicitly trained for specific skills. For instance, Go1 managed to climb higher stairs than some trained models, challenging the traditional belief regarding the necessity of explicit training on specific embodiments.

Implications and Future Directions

GROQLoco represents a significant advancement in the pursuit of generalized control strategies in robotics, specifically emphasizing the viability of offline training to produce robust locomotive capabilities in diverse and complex environments. The success of the approach not only promises practical implications for enhancing robotic autonomy in unstructured terrains but also potential theoretical insights into scalable learning models for multi-embodiment robotics.

Future work might explore extending this approach to address robots with greater diversity in morphological configurations, potentially involving bipeds or hexapods, and incorporating exteroceptive inputs, thereby allowing even broader generalization and adaptability. Additionally, integrating visual-processing capabilities could enhance situational awareness, further cementing the framework's operative efficacy.

Overall, the research paves the way for more versatile and adaptive robotic systems, reducing reliance on intensive fine-tuning procedures and advancing the field towards more intelligent and self-sufficient robotic agents.

Markdown Report Issue