General Humanoid Whole-Body Control via Pretraining and Fast Adaptation

Published 12 Feb 2026 in cs.RO | (2602.11929v1)

Abstract: Learning a general whole-body controller for humanoid robots remains challenging due to the diversity of motion distributions, the difficulty of fast adaptation, and the need for robust balance in high-dynamic scenarios. Existing approaches often require task-specific training or suffer from performance degradation when adapting to new motions. In this paper, we present FAST, a general humanoid whole-body control framework that enables Fast Adaptation and Stable Motion Tracking. FAST introduces Parseval-Guided Residual Policy Adaptation, which learns a lightweight delta action policy under orthogonality and KL constraints, enabling efficient adaptation to out-of-distribution motions while mitigating catastrophic forgetting. To further improve physical robustness, we propose Center-of-Mass-Aware Control, which incorporates CoM-related observations and objectives to enhance balance when tracking challenging reference motions. Extensive experiments in simulation and real-world deployment demonstrate that FAST consistently outperforms state-of-the-art baselines in robustness, adaptation efficiency, and generalization.

Abstract PDF Upgrade to Chat

Summary

The paper presents a unified framework (FAST) that enables robust humanoid whole-body control through large-scale pretraining and rapid adaptation.
It leverages a Mixture-of-Experts architecture with CoM-aware design and integrates Parseval regularization with KL constraints to maintain stability during adaptation.
Experimental results in simulation and on a Unitree G1 robot demonstrate superior long-horizon tracking, stability, and retention of source-domain performance.

FAST: A Unified Framework for General Humanoid Whole-Body Control via Pretraining and Fast Adaptation

Problem Statement and Motivation

The challenge of general whole-body control (WBC) for humanoid robots lies in executing diverse, coordinated whole-body motions with robustness to substantial distribution shifts encountered in realistic deployments. Classical approaches, often anchored in task-specific reward engineering or limited kinematic datasets, suffer performance degradation when tracking high-dynamic or out-of-distribution (OOD) motions, particularly those generated from low-fidelity modalities such as monocular video, teleoperation, or text-to-motion pipelines. Deployment constraints on inference latency and hardware further limit the practicality of foundation model scaling as a solution. FAST directly targets these limitations by explicitly designing for robust zero-shot generalization and rapid, stable adaptation to new or noisy motion distributions.

Methodological Innovations

The FAST pipeline is composed of three interdependent stages:

Curated Motion Dataset Construction: Human-to-humanoid retargeting is performed on diverse motion capture datasets (AMASS, OMOMO, in-house MoCap), incorporating substantial data augmentation through global velocity perturbation and lower-limb configuration variability. Auxiliary physical signals—contact masks, Center-of-Mass (CoM), and Center-of-Pressure (CoP)—are integrated to reinforce physical plausibility and stability cues.
Pretraining a Mixture-of-Experts Whole-Body Controller with CoM-Aware Control: The policy architecture leverages a Mixture-of-Experts (MoE) MLP structure with a gating network, enabling specialization across dynamic regimes while maintaining global coordination. CoM-Aware design augments observations with CoM/CoP and deploys adaptive tracking rewards and explicit stability terms, trading off strict tracking in favor of physical stability when references are aggressive or physically inconsistent. This mechanism enables the system to relax imitation loss in the presence of unexecutable references, increasing robustness in practical deployments.
Parseval-Guided Residual Policy Adaptation: For rapid adaptation, a lightweight residual delta policy (MLP) is introduced atop the frozen base policy, with adaptation occurring exclusively in the residual head. Adaptation is regularized by:
- Parseval Regularization: Enforces near-orthogonality in feature directions, bounding sensitivity to input perturbations and enabling smoother gradients, thus sustaining stable, sample-efficient optimization during fast adaptation.
- KL-Constrained Policy Update: Maintains distributional proximity to the base policy, mitigating catastrophic forgetting and preserving prior capabilities across OOD adaptation.

These innovations are theoretically supported by formal proofs establishing bounded Lipschitz continuity (for robust smoothness) and quantitative KL-induced constraints on policy deviation magnitude.

Figure 1: FAST architecture overview—curated data construction, MoE+CoM-Aware pretraining, and Parseval-guided fast adaptation pipeline.

Experimental Evaluation

Quantitative experiments are conducted in MuJoCo and IsaacLab simulators and validated on a Unitree G1 humanoid platform. Comprehensive metrics include task success rate, global/pose/keypoint errors, global root velocity error, mean CoM–CoP distance (as a measure of balance), and slippage.

Generalization and Robustness: On both in-distribution (AMASS) and OOD (MotionX) datasets, FAST achieves the top success rates, with pronounced improvement in long-horizon tracking and global stability compared to GMT and TWIST2. Notably, while not always minimizing MPJPE, FAST consistently yields lower global position errors, reflecting its higher-level physical coherence.
Fast Adaptation and Retention: In adaptation benchmarks (LaFan1/MotionX as targets, AMASS as source), FAST demonstrates rapid convergence, best overall target-domain metrics, and substantially superior retention of source-domain performance relative to naive fine-tuning or unregularized residual adaptation. Ablative analysis confirms that joint Parseval and KL regularization are necessary for balancing adaptation speed with source policy preservation.

Figure 2: Fast adaptation performance on LaFan1 and MotionX, with concurrent preservation of source capability on AMASS.

Stability via CoM-Awareness: In curated high-dynamic OOD settings, explicit CoM-Aware control reduces mean CoM–CoP deviation and slippage, increasing task completion rates versus ablations lacking CoM objectives. Visual analysis shows the suppression of root drift and successful recovery from balance-threatening poses.
Figure 3: CoM-Aware control enables stable execution of challenging dynamic motions, preventing falls and instability evident in baseline controllers.
Real-World Deployment: On the Unitree G1, FAST robustly executes single-leg squats, side kicks, and rapid teleoperation, even when infused with noisy or ill-posed references from text- or video-generation pipelines after minimal adaptation. Noteworthy is the generalization to previously unseen, physically aggressive trajectories without additional retraining.

Theoretical and Practical Implications

The combination of mixture-of-experts architectures, CoM/CoP-augmented policy design, and structured residual adaptation with orthogonalization/KL regularization establishes a principled template for scalable, robust humanoid control. By minimizing policy drift and ensuring smooth adaptation, FAST circumvents the primary failure modes often observed in naive fine-tuning and large-scale pretraining pipelines, particularly when faced with high-dimensional, low-quality, or adversarially perturbed motion distributions.

From a theoretical perspective, FAST demonstrates that regularized residual architectures can tightly bound sensitivity to data distribution shifts and parameter perturbations—critical for safe real-world deployment. Practically, the framework is immediately extensible to online and continual learning domains and robust teleoperation, especially in scenarios with unreliable input modalities.

Directions for Future Research

Key future directions involve:

Full online adaptation under streaming conditions and ultra-low supervision.
Meta-adaptive regularization schedules to dynamically trade-off adaptation aggressiveness with stability, based on task or environmental uncertainty.
Extension to manipulator-rich humanoid and multi-agent settings, leveraging hierarchical adaptation and attention mechanisms.
Tight integration with generative motion models, establishing feedback loops between motion synthesis, tracking, and physical evaluation in-the-loop.
Figure 4: Composition and diversity of the OOD high-dynamic dataset—critical for robust evaluation under extreme distribution shift.

Conclusion

FAST presents a robust, efficient methodology for general humanoid whole-body control, unifying large-scale, physically-grounded pretraining with stability-constrained fast adaptation. Through rigorous ablation and real-world experiments, it demonstrates superior robustness and adaptability over state-of-the-art universal trackers, providing a critical advance toward practical deployment of capable, resilient humanoid systems.