Learning Part-Aware Dense 3D Feature Field for Generalizable Articulated Object Manipulation

Published 15 Feb 2026 in cs.RO, cs.CV, and cs.LG | (2602.14193v1)

Abstract: Articulated object manipulation is essential for various real-world robotic tasks, yet generalizing across diverse objects remains a major challenge. A key to generalization lies in understanding functional parts (e.g., door handles and knobs), which indicate where and how to manipulate across diverse object categories and shapes. Previous works attempted to achieve generalization by introducing foundation features, while these features are mostly 2D-based and do not specifically consider functional parts. When lifting these 2D features to geometry-profound 3D space, challenges arise, such as long runtimes, multi-view inconsistencies, and low spatial resolution with insufficient geometric information. To address these issues, we propose Part-Aware 3D Feature Field (PA3FF), a novel dense 3D feature with part awareness for generalizable articulated object manipulation. PA3FF is trained by 3D part proposals from a large-scale labeled dataset, via a contrastive learning formulation. Given point clouds as input, PA3FF predicts a continuous 3D feature field in a feedforward manner, where the distance between point features reflects the proximity of functional parts: points with similar features are more likely to belong to the same part. Building on this feature, we introduce the Part-Aware Diffusion Policy (PADP), an imitation learning framework aimed at enhancing sample efficiency and generalization for robotic manipulation. We evaluate PADP on several simulated and real-world tasks, demonstrating that PA3FF consistently outperforms a range of 2D and 3D representations in manipulation scenarios, including CLIP, DINOv2, and Grounded-SAM. Beyond imitation learning, PA3FF enables diverse downstream methods, including correspondence learning and segmentation tasks, making it a versatile foundation for robotic manipulation. Project page: https://pa3ff.github.io

Abstract PDF Upgrade to Chat

Summary

The paper introduces PA3FF, a novel framework that integrates dense, part-aware 3D features to enhance generalization in robotic manipulation.
It employs contrastive learning and a diffusion policy (PADP) to efficiently predict action sequences from 3D point cloud data.
The method outperforms baselines on tasks like the PartInstruct benchmark, proving versatile for various downstream applications in robotics.

Overview of Part-Aware Dense 3D Feature Field for Generalizable Articulated Object Manipulation

The paper "Learning Part-Aware Dense 3D Feature Field for Generalizable Articulated Object Manipulation" (2602.14193) introduces PA3FF (Part-Aware 3D Feature Field), a novel 3D representation for robotic manipulation of articulated objects. The primary innovation is the integration of dense, semantic, part-aware features within a continuous 3D space using point clouds. This approach addresses limitations in existing 2D-based and lifted 3D feature representations, notably enhancing generalization across diverse object categories and shapes, which is crucial for tasks involving functional parts such as handles or knobs.

Methodology

Part-Aware 3D Feature Field (PA3FF)

PA3FF is trained using a contrastive learning framework to achieve part awareness. It leverages a large-scale labeled dataset for 3D part propositions, incorporating 3D geometric cues through the pre-trained Sonata model. The feature field assigns a latent feature vector to each point in the point cloud, ensuring that points with similar features likely belong to the same functional part. This enables a detailed semantic understanding necessary for manipulation tasks.

Part-Aware Diffusion Policy (PADP)

PADP, built upon PA3FF, enhances sample efficiency and generalization in robotic manipulation. The integration of PA3FF with a diffusion policy architecture allows the model to predict action sequences based on 3D point cloud observations and robot states. Using a Denoising Diffusion Implicit Model (DDIM), PADP efficiently infers manipulation actions, addressing common issues in other imitation learning frameworks like keyframe prediction limitations.

Experimental Evaluation

The paper presents extensive evaluations of PA3FF and PADP in both simulated and real-world environments. Transcending previous representations such as CLIP, DINOv2, and Grounded-SAM, PADP demonstrates superior sample efficiency and generalization performance. Quantitative analyses reveal significant improvements, with PADP achieving a notable state-of-the-art performance on the PartInstruct benchmark, as well as various real-world tasks.

Comparison with Baselines

The experiments benchmark PADP against multiple baselines—including Act3D, RVT2, and GenDP—and consistently show higher success rates in manipulation tasks under varied conditions. This underscores PA3FF's effectiveness in maintaining feature consistency across novel configurations, outperforming traditional and modern 2D and 3D representations.

Downstream Applications

Beyond imitation learning, PA3FF facilitates several downstream applications such as correspondence learning, segmentation tasks, and part decomposition. The consistency and semantic richness of the feature fields make PA3FF adaptable to various robotic manipulation scenarios, highlighting its versatility as a foundational model.

Implications and Future Directions

The development of PA3FF marks a significant step in robotic manipulation capabilities, particularly in handling articulated objects with complex interactions. By focusing on part-aware representations, the paper sets a precedent for future research to explore more sophisticated models that enhance the robustness and adaptability of robotic systems. Potential future directions include expanding the framework to handle deformable objects that exhibit complex structural changes, an area where current models face substantial challenges.

Conclusion

The paper offers a comprehensive exploration of PA3FF and PADP, showcasing their capability to address longstanding challenges in generalizable articulated object manipulation. Through innovation in 3D feature representation and diffusion-based policy learning, the research significantly advances the field of robotics, paving the way for more adaptive and intelligent robotic systems.

Markdown Report Issue