TAR: Teacher-Aligned Representations via Contrastive Learning for Quadrupedal Locomotion

Published 26 Mar 2025 in cs.RO, cs.LG, cs.SY, and eess.SY | (2503.20839v1)

Abstract: Quadrupedal locomotion via Reinforcement Learning (RL) is commonly addressed using the teacher-student paradigm, where a privileged teacher guides a proprioceptive student policy. However, key challenges such as representation misalignment between the privileged teacher and the proprioceptive-only student, covariate shift due to behavioral cloning, and lack of deployable adaptation lead to poor generalization in real-world scenarios. We propose Teacher-Aligned Representations via Contrastive Learning (TAR), a framework that leverages privileged information with self-supervised contrastive learning to bridge this gap. By aligning representations to a privileged teacher in simulation via contrastive objectives, our student policy learns structured latent spaces and exhibits robust generalization to Out-of-Distribution (OOD) scenarios, surpassing the fully privileged "Teacher". Results showed accelerated training by 2x compared to state-of-the-art baselines to achieve peak performance. OOD scenarios showed better generalization by 40 percent on average compared to existing methods. Additionally, TAR transitions seamlessly into learning during deployment without requiring privileged states, setting a new benchmark in sample-efficient, adaptive locomotion and enabling continual fine-tuning in real-world scenarios. Open-source code and videos are available at https://ammousa.github.io/TARLoco/.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

Teacher-Aligned Representations via Contrastive Learning for Quadrupedal Locomotion

The research paper presents a novel framework called Teacher-Aligned Representations via Contrastive Learning (TARL) to enhance quadrupedal locomotion. It seeks to address fundamental challenges associated with reinforcement learning (RL) methods used in robotic locomotion where a privileged teacher policy guides a proprioceptive student policy. The key issues identified include representation misalignment between the teacher and student, covariate shift, and the lack of adaptable deployable policies.

Technical Approach

The TARL approach utilizes privileged information in simulation to train structured latent spaces via self-supervised contrastive learning. Unlike traditional feature regression, this method entails aligning representations to a privileged teacher using contrastive objective functions. By leveraging self-supervised learning (SSL) techniques and contrastive learning strategies, TARL bridges the representation gap, improving structured latent space learning and enhancing robust generalization in out-of-distribution (OOD) scenarios.

Key Contributions

Efficient Representation Alignment: TARL employs a contrastive learning strategy that aligns student representations with privileged teacher-derived features, thereby mitigating inherent mismatches between their latent spaces. The approach reduces training time by 50% and improves OOD generalization performance by 42.2%, outperforming established baselines.
Robust Adaptation and Negative Sampling: The model integrates a task-informed negative sampling mechanism, offering a 5.6% boost in evaluation metrics. This strategy enhances representation separation by rendering the latent space task-relevant and environmentally adaptive.
Deployable Learning Capabilities: TARL circumvents reliance on privileged information during real-world deployment by aligning proprioceptive representations through contrastive learning, facilitating adaptability and continual learning beyond simulation.

Results and Implications

The framework demonstrated accelerated training performance, achieving peak performance twice as fast as state-of-the-art methods. Evaluation on diverse terrains and simulated environments revealed TARL's superior generalization in both in-distribution and out-of-distribution scenarios.

The research findings highlight the pivotal role of representation learning in quadrupedal locomotion. TARL establishes new benchmarks for sample-efficient adaptive locomotion, challenging existing paradigms in RL-based robotic controllers. By enabling continual adaptation, TARL extends its applicability to dynamic and non-stationary environments, offering promising avenues for future developments in adaptive AI systems.

Future Directions

The research sets forth several extensions, particularly integrating off-policy RL algorithms to further enhance deployable learning efficiency. Additionally, exploring the framework's adaptability across various robotic morphologies and integrating real-world fine-tuning mechanisms are recommended next steps.

TARL marks a significant advancement in aligning teacher-student architectures for robotic locomotion. Its contrastive learning foundation provides profound insights into enhancing robustness and adaptability, paving the way for more resilient autonomous systems in complex environments.