Feel the Force: Contact-Driven Learning from Humans

Published 2 Jun 2025 in cs.RO and cs.AI | (2506.01944v1)

Abstract: Controlling fine-grained forces during manipulation remains a core challenge in robotics. While robot policies learned from robot-collected data or simulation show promise, they struggle to generalize across the diverse range of real-world interactions. Learning directly from humans offers a scalable solution, enabling demonstrators to perform skills in their natural embodiment and in everyday environments. However, visual demonstrations alone lack the information needed to infer precise contact forces. We present FeelTheForce (FTF): a robot learning system that models human tactile behavior to learn force-sensitive manipulation. Using a tactile glove to measure contact forces and a vision-based model to estimate hand pose, we train a closed-loop policy that continuously predicts the forces needed for manipulation. This policy is re-targeted to a Franka Panda robot with tactile gripper sensors using shared visual and action representations. At execution, a PD controller modulates gripper closure to track predicted forces-enabling precise, force-aware control. Our approach grounds robust low-level force control in scalable human supervision, achieving a 77% success rate across 5 force-sensitive manipulation tasks. Code and videos are available at https://feel-the-force-ftf.github.io.

Abstract PDF Upgrade to Chat

Summary

The paper introduces FEELTHEFORCE, a contact-driven imitation learning framework that leverages human tactile-proprioceptive demonstrations to train robots.
It employs a transformer-based policy with PD control for real-time force stabilization, achieving a 77% mean success rate across various manipulation tasks.
The method demonstrates robust generalization and resilience to sensor noise and disturbances, eliminating reliance on large-scale teleoperation datasets.

Contact-Driven Imitation Learning: The FEELTHEFORCE Framework

Introduction

Precise force-aware robot manipulation remains one of the central unsolved challenges in robotics, particularly for tasks requiring real-time adjustment of fine-grained contact forces amidst sensor noise and embodiment discrepancies. "Feel the Force: Contact-Driven Learning from Humans" (2506.01944) introduces FEELTHEFORCE (FTF), a contact-driven imitation learning method that leverages direct human tactile-proprioceptive signals instead of conventional robot-collected or simulation data. By utilizing a tactile-sensing glove with ergonomic design, combined with vision-based pose tracking, FTF provides a scalable approach to policy learning for force-sensitive tasks without dependency on expensive teleoperation or large-scale robot interaction datasets.

Methodology

Tactile Data Acquisition

FTF collects human manipulation demonstrations via custom tactile gloves equipped with 3D-printed AnySkin magnetometer-based force sensors. These sensors, positioned strategically for minimal manipulation interference, generate high-fidelity 3D force vectors at 200 Hz. Multimodal data streams, including both force sensors and stereo RGB video from calibrated RealSense cameras, are temporally aligned and used to capture the tactile and kinematic signatures of human interaction. The glove design ensures close mapping to the robot embodiment, using similar tactile sensors in the gripper of a Franka Panda arm.

Unified Key Point-based Representation

To bridge the morphological gap between human demonstrators and robotic hardware, the system employs a unified key point-based representation. Human hand keypoints are triangulated from image-space to 3D using MediaPipe and stereo geometry. Critical object keypoints are sparsely annotated, then tracked across trajectories via Co-Tracker and semantically propagated with DIFT, grounded in the robot’s frame.

Policy Learning and Control Architecture

A transformer-based policy model receives as input temporal histories of robot and object positional keypoints, the binarized gripper state, and continuous force values. The model outputs future trajectories for both the robot end effector and the predicted target force. The transformer leverages action chunking with exponential temporal averaging to mitigate frame jitter and ensure stable trajectories.

Force control at execution is decoupled from high-level policy prediction, with a PD-based outer-loop controller modulating the gripper to match the continuous target force output by the transformer policy. The force setpoint is updated iteratively until the measured force reaches within threshold $\epsilon$ of the predicted force, after which the policy rollout advances. This architecture enables closed-loop, real-time force stabilization, facilitating robust behavior even under distributional shifts.

Experimental Evaluation

Benchmarks and Tasks

FTF is evaluated on five force-sensitive, real-world manipulation tasks on the Franka Panda: placing soft bread on a plate, unstacking a single plastic cup, placing an egg in a pot, placing a bag of chips on a plate, and twisting/lifting a bottle cap. The tasks were selected to probe the system’s handling of both rigid and highly deformable objects under varying initial conditions and required force precision.

Baselines encompass (i) vision- and tactile-based transformer approaches using passive force integration from human demonstration, (ii) continuous and binary gripper mappings, and (iii) policy learning from teleoperated robot data (P3-PO and variants). Each method is tested with 30 demonstrations per task.

Results and Analysis

FTF reports a mean success rate of 77% across all tasks, outperforming all baselines, including those using direct robot teleoperation and passive tactile integration. Notably:

On tasks like unstacking cups and handling delicate objects (bread, egg, chips), FTF is the only approach achieving a high success rate, highlighting the inefficacy of naive binary/continuous gripper mapping and the critical role of active force prediction and modulation.
For the twist-and-lift bottle cap task, FTF achieves 13/15 successes, avoiding hard-grip-induced failures observed in binary gripper baselines.
Baseline methods, particularly those utilizing continuous closure mapping from human to robot, suffer from sample inefficiency and fail to generalize across varying task requirements.
When subjected to adversarial test-time disturbances (e.g., external force perturbations during bag lifting), FTF maintains 67% task success, demonstrating robustness to distributional shift in tactile feedback.

Ablation experiments show that masking force measurements during policy training does not degrade FTF's prediction or performance, indicating that the model effectively infers force requirements from environment and state.

Implications and Future Directions

This work provides strong empirical evidence that active prediction and reproduction of human-derived contact forces, rather than passive force signal integration, dramatically improves generalization and robustness in force-sensitive manipulation. The method eliminates the need for large-scale robot interaction or costly haptic teleoperation systems, democratizing policy learning for contact-rich tasks by leveraging scalable human demonstration pipelines.

Theoretically, FTF demonstrates that keypoint-based, embodiment-agnostic observation and action spaces support efficient human-to-robot skill transfer at the tactile level. Practically, the proposed architecture offers a path toward robot learning pipelines that exploit unconstrained, natural human tactile data for real-world manipulation.

Outstanding limitations include:

Loss of force directionality due to aggregation of shear and normal forces, which restricts dexterous manipulation with high DOF hands.
Dependence on static, calibrated camera setups; generalization to uncalibrated or egocentric settings is a critical direction for scaling data collection.
Current evaluation is on single-arm, single-gripper tasks; extending to soft robotics, bimanual coordination, or general-purpose tactile reasoning remains open.

Advances in tactile sensing hardware, dense point cloud tracking, and self-supervised in-the-wild data acquisition promise to further close the domain gap between human and robot manipulation, enabling broader deployment of force-sensitive robots in complex, unconstrained environments.

Conclusion

FEELTHEFORCE provides a rigorous, scalable framework for contact-driven robot learning directly from human tactile demonstration. By coupling transformer-based imitation learning with active force prediction and PD control, it achieves markedly superior performance and robustness over existing baselines in force-sensitive tasks. The results and architectural principles elucidated by this work supply a foundation for future efforts in learning generalizable, tactile-aware robotic manipulation, and underscore the value of leveraging natural human touch as a signal for robotic skill acquisition.

Markdown Report Issue