Papers
Topics
Authors
Recent
Search
2000 character limit reached

Collision-Free Humanoid Traversal in Cluttered Indoor Scenes

Published 22 Jan 2026 in cs.RO | (2601.16035v1)

Abstract: We study the problem of collision-free humanoid traversal in cluttered indoor scenes, such as hurdling over objects scattered on the floor, crouching under low-hanging obstacles, or squeezing through narrow passages. To achieve this goal, the humanoid needs to map its perception of surrounding obstacles with diverse spatial layouts and geometries to the corresponding traversal skills. However, the lack of an effective representation that captures humanoid-obstacle relationships during collision avoidance makes directly learning such mappings difficult. We therefore propose Humanoid Potential Field (HumanoidPF), which encodes these relationships as collision-free motion directions, significantly facilitating RL-based traversal skill learning. We also find that HumanoidPF exhibits a surprisingly negligible sim-to-real gap as a perceptual representation. To further enable generalizable traversal skills through diverse and challenging cluttered indoor scenes, we further propose a hybrid scene generation method, incorporating crops of realistic 3D indoor scenes and procedurally synthesized obstacles. We successfully transfer our policy to the real world and develop a teleoperation system where users could command the humanoid to traverse in cluttered indoor scenes with just a single click. Extensive experiments are conducted in both simulation and the real world to validate the effectiveness of our method. Demos and code can be found in our website: https://axian12138.github.io/CAT/.

Summary

  • The paper introduces a Humanoid Potential Field framework that encodes per-body-part collision cues for robust humanoid navigation.
  • It integrates this field into reinforcement learning, using dense reward shaping and observation sampling to overcome local minima and sample inefficiency.
  • Scalable hybrid scene generation and policy distillation enable high success rates and low sim-to-real variance in complex indoor environments.

Collision-Free Humanoid Traversal in Cluttered Indoor Scenes: An Expert Review

Problem Formulation and Motivation

The paper "Collision-Free Humanoid Traversal in Cluttered Indoor Scenes" (2601.16035) addresses autonomous loco-navigation for humanoid robots in complex, cluttered indoor environments characterized by diverse obstacle geometries and spatial layouts. The challenge resides in enabling whole-body traversal behaviors—such as hurdling, crouching, squeezing through narrow passages, and negotiating overhead, ground, and lateral obstacles—without collisions. Prior solutions are hindered by limitations in perception representations, reward sparsity, insufficient anticipation of collisions, and lack of scenario-level generalization.

Humanoid Potential Field (HumanoidPF): Concept and Implementation

Informative Humanoid–Obstacle Representation

The paper advances perceptual and control representations by generalizing the classical Artificial Potential Field (APF) framework to multi-jointed humanoids. The HumanoidPF encodes the spatial relationship between robot and obstacles as a differentiable gradient field, allowing every body part to perceive dense, anticipatory collision-free motion directions. The attractive and repulsive field components make use of geodesic distances for obstacle-aware guidance and signed distances for collision avoidance, respectively. Priority weighting schemes (root-centric and dynamic urgency) ensure effective resolution of whole-body conflicts, overcoming local minima and oscillations prevalent in standard APF approaches.

Integration with Reinforcement Learning

HumanoidPF serves two core roles in skill learning:

  • Policy Observation: At each timestep, HumanoidPF vectors sampled at 13 key body locations supply compact, task-relevant perceptual inputs. This explicit encoding of traversal guidance alleviates the sample inefficiency from implicit reasoning on raw visual data.
  • Reward Shaping: The policy's trajectory is encouraged to align with per-body-part preferred directions through a von Mises–Fisher distribution over the sphere, with concentration modulated by the HumanoidPF magnitude. This yields dense, anticipatory feedback that generalizes across scene topologies, in contrast to sparse collision-penalty heuristics. The continuous field aggregates spatial environment information and naturally suppresses perceptual noise, inducing low sim-to-real transfer variance.

Scalable Training and Generalization Procedures

Hybrid Scene Generation

Recognizing the inadequacy of naturalistic 3D scene datasets for capturing rare and challenging obstacle constellations, the authors propose a hybrid scene generation process. Crops from the 3D-FRONT dataset ensure structural realism, while procedural generation injects "extreme" spatial obstacles—overlapping full overhead, lateral, and ground constraints with high geometric complexity. Random SO(3)\mathrm{SO}(3) rotations and noise perturbations reinforce scene diversity. This approach directly expands the support of training curricula for RL, forcing the acquisition of robust collision-avoidance skills in emergency settings.

Specialist-to-Generalist Policy Distillation

Sample inefficiency in RL across vast, complex scene distributions is addressed via specialist-to-generalist training. Large batches (~32,768 environments) per scene are used to train specialist policies with PPO. These are then distilled into a unified generalist via DAgger, leveraging behavioral cloning for cross-domain generalization. Both specialist and generalist policies are robustified with sensor and force noise and curriculum learning to maximize deployment transferability.

Real-World Application: Click-and-Traverse (CAT) System

The CAT deployment integrates real-time SLAM (Fast-LIO2) and volumetric mapping (OctoMap) at 10 Hz, using HumanoidPF to dynamically update obstacle-guidance fields. A user interface allows goal specification with a single click, enabling intuitive operation absent the need for laborious teleoperation modalities. The approach demonstrates effective sim-to-real transfer on the Unitree G1 platform and exhibits strong robustness in highly variable, cluttered indoor settings.

Experimental Evaluation

Performance Benchmarks

  • Scene Diversity: Eight manually-designed types of cluttered scenes were used to characterize performance across the obstacle spectrum.
  • Metrics: Success Rate (SR, %) and Distance Error (DE, m) measured collision-free goal attainment and proximity.
  • Comparisons: ASTraversal (quadruped-centric, elevation maps), Humanoid Parkour (terrain-focused collision-penalty), and ablations lacking HumanoidPF for either observation or reward were tested.

The proposed method yielded SR >> 90% and DE << 0.1 m on most scene types—substantively out-performing baselines, especially in scenarios with high spatial intricacy and combined constraints. Moreover, variance across trials was markedly lower, indicating stable policy behaviors.

Generalization and Sim-to-Real Transfer

Zero-shot traversal in 30 artist-designed indoor scenes demonstrated that increasing procedural obstacle complexity during training led to substantial improvements: SR improved from 62% (base dataset) to 95% (full-hard hybrid). Sim-to-real transfer experiments employed voxel grid and multi-layer elevation map baselines, with HumanoidPF showing superior stability and collision avoidance, especially where sensor noise and environment perturbations were present.

Theoretical and Practical Implications

The study establishes that embedding dense, anticipatory collision-avoidance information via HumanoidPF into both policy perception and reward enables RL frameworks to efficiently learn robust whole-body traversal skills. This principle is generalizable to high-DOF robots in environments with complex, full-spatial constraints. The results highlight that automated reward-line supervision and field-based perception significantly close sim-to-real gaps, suggesting applicability for real-world service robotics and domestic humanoid deployment.

Practically, the hybrid scene generation and CAT interface set useful precedents for scalable RL dataset construction and deployment-friendly operator interfaces. The approach points to promising future directions: full contact-rich interaction modeling, extension to unstructured or highly deformable environments, and integration with semantic mapping and object affordance recognition.

Conclusion

The HumanoidPF framework enables high-fidelity, collision-free traversal for humanoid robots across cluttered indoor scenes, outperforming prevailing methods in both simulation and real-world settings. By reformulating obstacle guidance with per-body-part APF fields and employing scalable RL strategies supported by rich scene generation, the solution advances autonomous humanoid navigation toward practical deployment. Open challenges remain in further generalization, contact-rich skill learning, and scaling to more complex, dynamic environmental conditions.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We're still in the process of identifying open problems mentioned in this paper. Please check back in a few minutes.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 38 likes about this paper.