Create a Video View Paper

Being-H0.5: Scaling Human-Centric Robot Learning

A lightning talk describing Being-H0.5, a robot learning framework that leverages 35,000+ hours of data and a unified action space to enable cross-embodiment generalization.

Script

What if learning to control thirty different robots introduced a translation crisis, where every machine spoke a unique identifying physical language? The authors of this paper propose solving this by treating human hand motion as a universal 'mother tongue' for robotic interaction.

The core problem is that every robot typically requires its own dataset because of unique kinematics and control schemes, which fragments suitable training data. To bridge these gaps, the researchers utilize human interaction traces as a shared physical prior that can help translates diverse robot actions into a common format.

To realize this vision, they constructed UniHand-2.0, a massive dataset combining sixteen thousand hours of human video with diverse robot demonstrations. On the architectural side, they introduced a Unified Action Space that maps these heterogeneous inputs into semantically aligned slots, processing them with a specialized Mixture-of-Flow model.

This chart highlights the unprecedented scale of the project, contrasting the thirty-five thousand hours of data in UniHand-2.0 against previous efforts. By integrating such a vast array of human and robot experiences, the model learns robust priors that are capable of generalizing far better than baselines trained on robot data alone.

Deploying this generalist required new engineering techniques like Manifold-Preserving Gating to handle sensory noise, leading to nearly ninety-nine percent success on simulation benchmarks. However, the authors note that while zero-shot transfer works, it still suffers from lower motion precision compared to specialist models.

Being-H0.5 demonstrates that scaling human-centric data is the key to unlocking true cross-embodiment generalization in robotics. For more on this comprehensive benchmark in robot learning, visit EmergentMind.com.