SparseDFF: Sparse-View Feature Distillation for One-Shot Dexterous Manipulation
Abstract: Humans demonstrate remarkable skill in transferring manipulation abilities across objects of varying shapes, poses, and appearances, a capability rooted in their understanding of semantic correspondences between different instances. To equip robots with a similar high-level comprehension, we present SparseDFF, a novel DFF for 3D scenes utilizing large 2D vision models to extract semantic features from sparse RGBD images, a domain where research is limited despite its relevance to many tasks with fixed-camera setups. SparseDFF generates view-consistent 3D DFFs, enabling efficient one-shot learning of dexterous manipulations by mapping image features to a 3D point cloud. Central to SparseDFF is a feature refinement network, optimized with a contrastive loss between views and a point-pruning mechanism for feature continuity. This facilitates the minimization of feature discrepancies w.r.t. end-effector parameters, bridging demonstrations and target manipulations. Validated in real-world scenarios with a dexterous hand, SparseDFF proves effective in manipulating both rigid and deformable objects, demonstrating significant generalization capabilities across object and scene variations.
- Goal directed multi-finger manipulation: Control policies and analysis. Computers & Graphics, 37(7):830–839, 2013.
- Learning dexterous in-hand manipulation. International Journal of Robotics Research (IJRR), 39(1):3–20, 2020.
- Dexterous manipulation using both palm and fingers. In International Conference on Robotics and Automation (ICRA), 2014.
- The ycb object and model set: Towards common benchmarks for manipulation research. In International Conference on Robotics and Automation (ICRA), 2015a.
- Benchmarking in manipulation research: The ycb object and model set and benchmarking protocols. arXiv preprint arXiv:1502.03143, 2015b.
- Yale-cmu-berkeley dataset for robotic manipulation research. International Journal of Robotics Research (IJRR), 36(3):261–268, 2017.
- Emerging properties in self-supervised vision transformers. In International Conference on Computer Vision (ICCV), 2021.
- A system for general in-hand object re-orientation. In Conference on Robot Learning (CoRL), 2022.
- A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning (ICML), 2020.
- D-grasp: Physically plausible dynamic grasp synthesis for hand-object interactions. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Extrinsic dexterity: In-hand manipulation with external forces. In International Conference on Robotics and Automation (ICRA), 2014.
- Graspnerf: multiview-based 6-dof grasp detection for transparent and specular objects using generalizable nerf. In International Conference on Robotics and Automation (ICRA), 2023.
- Push-grasping with dexterous hands: Mechanics and a method. In International Conference on Intelligent Robots and Systems (IROS), 2010.
- Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6):381–395, 1981.
- Dense object nets: Learning dense visual object descriptors by and for robotic manipulation. arXiv preprint arXiv:1806.08756, 2018.
- Generalization in dexterous manipulation via geometry-aware multi-task learning. arXiv preprint arXiv:2111.03062, 2021.
- Grasping field: Learning implicit representations for human grasps. In International Conference on 3D Vision (3DV), 2020.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (TOG), 42(4):1–14, 2023.
- Lerf: Language embedded radiance fields. arXiv preprint arXiv:2303.09553, 2023.
- Segment anything. arXiv:2304.02643, 2023.
- Decomposing nerf for editing via feature field distillation. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Learning dexterous manipulation policies from experience and imitation. arXiv preprint arXiv:1611.05095, 2016a.
- Optimal control with learned local models: Application to dexterous manipulation. In International Conference on Robotics and Automation (ICRA), 2016b.
- Gendexgrasp: Generalizable dexterous grasping. In International Conference on Robotics and Automation (ICRA), 2023.
- Spawnnet: Learning generalizable visuomotor skills from pre-trained networks. arXiv preprint arXiv:2307.03567, 2023.
- Hoi4d: A 4d egocentric dataset for category-level human-object interaction. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Planning multi-fingered grasps as probabilistic inference in a learned deep network. In Robotics Research: The International Symposium, 2020.
- Learning dexterous grasping with object-centric visual affordances. In International Conference on Robotics and Automation (ICRA), 2021.
- Dexvip: Learning dexterous grasping with human hand pose priors from video. In Conference on Robot Learning (CoRL), 2022.
- kpam: Keypoint affordances for category-level robotic manipulation. In The International Symposium of Robotics Research, 2019.
- Deep dynamics models for learning dexterous manipulation. In Conference on Robot Learning (CoRL), 2020.
- An overview of dexterous manipulation. In International Conference on Robotics and Automation (ICRA), 2000.
- Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
- Openscene: 3d scene understanding with open vocabularies. In Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- In-hand object rotation via rapid motor adaptation. In Conference on Robot Learning (CoRL), 2023.
- Dexpoint: Generalizable point cloud reinforcement learning for sim-to-real dexterous manipulation. In Conference on Robot Learning (CoRL), 2023.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), 2021.
- Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017.
- Language embedded radiance fields for zero-shot task-oriented grasping. arXiv preprint arXiv:2309.07970, 2023.
- Neural volumetric object selection. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Daniela Rus. In-hand dexterous manipulation of piecewise-smooth 3-d objects. International Journal of Robotics Research (IJRR), 18(4):355–381, 1999.
- Equivariant descriptor fields: Se (3)-equivariant energy-based models for end-to-end visual robotic manipulation learning. arXiv preprint arXiv:2206.08321, 2022.
- Articulated hands: Force control and kinematic issues. International Journal of Robotics Research (IJRR), 1(1):4–17, 1982.
- Clip-fields: Weakly supervised semantic fields for robotic memory. arXiv preprint arXiv:2210.05663, 2022.
- Learning high-dof reaching-and-grasping via dynamic representation of gripper-object interaction. arXiv preprint arXiv:2204.13998, 2022.
- Distilled feature fields enable few-shot manipulation. In Conference on Robot Learning (CoRL), 2023.
- Panoptic lifting for 3d scene understanding with neural fields. In Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Neural descriptor fields: Se (3)-equivariant object representations for manipulation. In International Conference on Robotics and Automation (ICRA), 2022.
- Se (3)-equivariant relational rearrangement with neural descriptor fields. In Conference on Robot Learning (CoRL), 2023.
- Neural feature fusion fields: 3d distillation of self-supervised 2d image representations. In International Conference on 3D Vision (3DV), 2022.
- Se (3)-diffusionfields: Learning smooth cost functions for joint grasp and motion optimization through diffusion. In International Conference on Robotics and Automation (ICRA), 2023.
- Unidexgrasp++: Improving dexterous grasping policy learning via geometry-aware curriculum and iterative generalist-specialist learning. arXiv preprint arXiv:2304.00464, 2023.
- Dexgraspnet: A large-scale robotic dexterous grasp dataset for general objects based on simulation. In International Conference on Robotics and Automation (ICRA), 2023.
- Generalized anthropomorphic functional grasping with minimal demonstrations. arXiv preprint arXiv:2303.17808, 2023.
- Neural grasp distance fields for robot manipulation. In International Conference on Robotics and Automation (ICRA), 2023.
- Learning generalizable dexterous manipulation from human grasp affordance. In Conference on Robot Learning (CoRL), 2023.
- Pointcontrast: Unsupervised pre-training for 3d point cloud understanding. In European Conference on Computer Vision (ECCV), 2020.
- Unidexgrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy. In Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Useek: Unsupervised se (3)-equivariant 3d keypoints for generalizable manipulation. In International Conference on Robotics and Automation (ICRA), 2023.
- Gnfactor: Multi-task real robot learning with generalizable neural feature fields. arXiv preprint arXiv:2308.16891, 2023.
- In-place scene labelling and understanding with implicit scene representation. In Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.