Learning to Walk in Costume: Adversarial Motion Priors for Aesthetically Constrained Humanoids

Published 6 Sep 2025 in cs.RO, cs.AI, and eess.SY | (2509.05581v1)

Abstract: We present a Reinforcement Learning (RL)-based locomotion system for Cosmo, a custom-built humanoid robot designed for entertainment applications. Unlike traditional humanoids, entertainment robots present unique challenges due to aesthetic-driven design choices. Cosmo embodies these with a disproportionately large head (16% of total mass), limited sensing, and protective shells that considerably restrict movement. To address these challenges, we apply Adversarial Motion Priors (AMP) to enable the robot to learn natural-looking movements while maintaining physical stability. We develop tailored domain randomization techniques and specialized reward structures to ensure safe sim-to-real, protecting valuable hardware components during deployment. Our experiments demonstrate that AMP generates stable standing and walking behaviors despite Cosmo's extreme mass distribution and movement constraints. These results establish a promising direction for robots that balance aesthetic appeal with functional performance, suggesting that learning-based methods can effectively adapt to aesthetic-driven design constraints.

Abstract PDF Upgrade to Chat

Summary

The paper demonstrates that adversarial motion priors (AMP) can enable lifelike walking in aesthetically constrained humanoids by overcoming stability challenges.
It details a sim-to-real pipeline using domain randomization, motion retargeting with CMU Mocap, and PD controllers to manage a disproportionate center of mass.
The reinforcement learning approach shows robust performance in dynamic environments, combining visual appeal with functional stability for entertainment robotics.

Learning to Walk in Costume: Adversarial Motion Priors for Aesthetically Constrained Humanoids

Introduction to Entertainment Humanoids

The paper introduces a reinforcement learning-based locomotion system tailored for Cosmo, a custom humanoid robot uniquely designed for entertainment applications. Unlike traditional humanoids, robots in the entertainment sector face significant challenges due to design choices that prioritize aesthetics over functional stability. These challenges include disproportionate body parts that affect the center of mass, limited sensing capabilities due to aesthetic shells, and restricted joint mobility due to protective coverings.

Cosmo embodies these challenges by lacking an onboard vision system and having an elevated center of mass, making it an ideal case study for testing conventional locomotion methods against learning-based alternatives. By leveraging Adversarial Motion Priors (AMP), the study aims to enable Cosmo to learn natural-looking movements while maintaining physical stability, addressing balancing, walking, and safe deployment constraints via tailored domain randomization techniques.

Figure 1: Cosmo: an entertainment humanoid robot with covers designed for a blockbuster movie. (Left): CAD Design. (Top): Using Isaac Gym's massively parallelized environments to train with different styles and terrain. (Bottom): Sim-to-Real demonstration of natural walking (see supplementary video).

Methods and Techniques

Motion Retargeting

Due to the morphological differences between humans and robots like Cosmo, motion retargeting is necessary to adapt human movement data to the robot's kinematic constraints. The paper utilizes the Rokoko plugin in Blender for this purpose, employing the CMU Mocap Dataset to build custom animation rigs matching Cosmo's proportions.

Imitation Learning with AMP

The paper frames the locomotion problem as a Partially Observable Markov Decision Process (POMDP) and aims to learn a policy that maps observations to actions. The learning process leverages AMP, which incorporates a discriminator network to differentiate motions from reference datasets, driving the policy to generate lifelike performance while ensuring task completion and physical constraint adherence.

By incorporating AMP with a comprehensive sim-to-real transfer pipeline that includes domain randomization and hardware tuning, the study tackles significant mechanical limitations and entertainment-focused design constraints.

Simulation and Domain Randomization

In addressing Cosmo's instability due to its large head, the authors employed NVIDIA's Isaac Sim for creating detailed simulations that analyze stability based on the center of mass and foot-ground contact points. They applied domain randomization strategies in training to ensure robustness against dynamic uncertainties and sensor noise.

Figure 2: Cosmo visualization: (left) arm range of motion and internal vs. exterior housing comparison; (right) mass distribution analysis highlighting the disproportionate head mass.

Practical Implementation

For hardware implementation, Cosmo employs Westwood Robotics actuators that enable precise torque control using internal sensing. The locomotion control leverages low-level PD controllers for translating policy to actuation. This setup ensures stable performance under the atypical mass distribution constraints posed by Cosmo's design.

Figure 3: Sim-to-Real pipeline: (a) Retargeting from diverse data sources, (b) Training, (c) Validation, (d) Deployment.

Experimental Results

Stability and Flexible Motion

The trained policy successfully demonstrates stable standing and walking, with various motion styles from basic standing to expressive walking exhibiting dynamic human-like qualities. The policy efficiently handles real-world physical constraints through AMP-influenced reinforcement learning (Figure 2).

Figure 4: AMP policies with styles for balancing, model-based walking and walking with swagger.

Performance and Sim-to-Real Transfer

The study achieves robust balancing capabilities despite Cosmo's challenging morphology, using domain randomization to bridge the sim-to-real gap. The resulting locomotion and disturbance rejection validate the efficacy of AMP in producing human-like adaptability in aesthetic-centric designs.

Figure 5: Joint tracking and disturbance rejection. Shaded regions indicate disturbance periods.

Conclusion

This research effectively demonstrates the potential of learning-based frameworks, particularly Adversarial Motion Priors, to address unique stability and motion challenges in aesthetically constrained humanoid robots. By showcasing how AMP-guided reinforcement learning can surpass conventional control approaches, this study opens new possibilities for combining aesthetic appeal with functional performance in entertainment robotics.

Although prioritizing stability and expressiveness over agility, the approach aligns optimally with entertainment applications where visual plausibility and hardware preservation are critical. Future work could focus on comparative analyses with traditional controllers and explore broader applicability across diverse morphologies within the entertainment sector.