ContactRL: Safe Reinforcement Learning based Motion Planning for Contact based Human Robot Collaboration

Published 3 Dec 2025 in cs.RO | (2512.03707v1)

Abstract: In collaborative human-robot tasks, safety requires not only avoiding collisions but also ensuring safe, intentional physical contact. We present ContactRL, a reinforcement learning (RL) based framework that directly incorporates contact safety into the reward function through force feedback. This enables a robot to learn adaptive motion profiles that minimize human-robot contact forces while maintaining task efficiency. In simulation, ContactRL achieves a low safety violation rate of 0.2\% with a high task success rate of 87.7\%, outperforming state-of-the-art constrained RL baselines. In order to guarantee deployment safety, we augment the learned policy with a kinetic energy based Control Barrier Function (eCBF) shield. Real-world experiments on an UR3e robotic platform performing small object handovers from a human hand across 360 trials confirm safe contact, with measured normal forces consistently below 10N. These results demonstrate that ContactRL enables safe and efficient physical collaboration, thereby advancing the deployment of collaborative robots in contact-rich tasks.

Abstract PDF Upgrade to Chat

Summary

The paper presents ContactRL, a safe reinforcement learning framework that integrates real force feedback with a kinetic energy-based control barrier function for contact-rich tasks.
It employs a Soft Actor-Critic algorithm with a balanced reward design to achieve an 87.17% task success rate and a 0.20% safety violation rate in both simulation and real-world experiments.
The framework ensures smooth, time-efficient trajectories that maintain peak contact forces below 10 N, guaranteeing human comfort and safety during interaction.

Safe RL-Based Motion Planning for Human-Robot Close-Contact Collaboration: An Expert Analysis of ContactRL

Introduction and Problem Formulation

Physical human-robot collaboration, particularly in the context of contact-rich manipulation tasks, presents non-trivial safety and control challenges. In applications such as small object handovers—where a robot directly grasps items from a human palm—interaction safety cannot be reduced to mere collision avoidance. Rather, safe intentional physical contact, characterized by bounded contact forces, is critical. "ContactRL: Safe Reinforcement Learning based Motion Planning for Contact based Human Robot Collaboration" (2512.03707) presents a reinforcement learning (RL) framework that explicitly addresses safety in contact-rich collaborative tasks by integrating real force feedback into policy training and deploying control barrier function (CBF)-based runtime safety filtering.

The framework directly tackles the objective of generating time-efficient trajectories that guarantee terminal safety constraints in terms of peak normal contact force, thereby ensuring human comfort and minimizing injury risk.

Figure 1: The task involves computing a trajectory that minimizes completion time while guaranteeing the terminal contact force on the hand remains below a safe threshold ( $F_\tau$ ).

ContactRL Framework Overview

ContactRL consists of a model-free RL pipeline augmented with force feedback and a kinetic energy-based control barrier function safety shield. The RL agent observes robot and human hand poses, and the end-effector’s velocity, and it produces smooth, bounded displacement commands in Cartesian space.

The reward function is richly structured, trading off task completion, contact safety, jerk minimization, and dynamic proximity-aware scaling. Force feedback is provided by an accurate model, with a specifically designed term incentivizing the agent to maintain contact forces below the preset safety threshold ( $F_\tau$ ). Jerk and proximity terms further promote smooth approach behaviors and efficient motion.

Figure 2: The overall ContactRL architecture integrates visual information for state estimation, uses force feedback for reward shaping, and governs robot motion through policy outputs and inverse kinematics; safety is enforced via kinetic energy-based CBF.

Simulation is performed on a UR3e manipulator with PyBullet using a flat-plane hand model and physically realistic mass and friction attributes for direct force modeling.

Figure 3: The simulated testbed includes the UR3e manipulator, a graspable object, and a reduced-complexity plane-based hand model for efficient and representative training of contact dynamics.

Policy Training and Shielded Deployment

ContactRL uses Soft Actor-Critic (SAC) for robust training in continuous action spaces. Extensive ablation over reward composition demonstrates that careful balancing between reach, safety, jerk, and proximity yields the lowest safety violation incidence with high task efficiency.

A kinetic energy-based CBF, deployed as a runtime safety shield, operates by constraining the instantaneous kinetic energy of the end-effector, ensuring that the projected velocity profile cannot result in forces exceeding human-safe thresholds. The shield operates as a real-time QP, utilizing forward-invariance of an energy set to provide a theoretically grounded safety envelope over all policy outputs, regardless of stochastic policy variation or real-world perturbation.

Figure 4: Reward design ablation demonstrates that a balanced reward (RF5) achieves minimal safety violations and optimal completion times, confirming the necessity of integrated safety terms.

The effectiveness of the shield is confirmed in simulated trajectories, where it eliminates kinetic energy and velocity spikes associated with rare but severe safety violations caused by high-frequency policy noise, which conventional low-pass filtering cannot prevent.

Figure 5: (Top) Unshielded policy displays stochastic-induced kinetic energy and speed spikes; (Bottom) eCBF shielded policy enforces smooth dissipation and precludes excessive contact energy.

Simulation and Real-World Experimental Results

ContactRL has been thoroughly benchmarked against state-of-the-art constrained RL algorithms, including Constrained Policy Optimization (CPO) and SAC-Lagrangian. It achieves substantial performance gains:

Task success rate: 87.17% (ContactRL), markedly exceeding CPO (33.33%) and SACLag (55.60%)
Safety violation rate: 0.20%, matching or smaller than alternatives, with just 1 violation per 1000 trials
Contact force: Maintains mean peak force ≈ 10 N, compared to larger fluctuations in other baselines
Motion smoothness: RMS jerk substantially lower than alternatives (931.3 m/s³ vs. 1355–3924 m/s³)

Comprehensive real-world validation was conducted with a UR3e robot equipped with a force/torque sensor and gripper, evaluating object handover to and from 12 human subjects across 360 trials.

Figure 6: The physical experiment platform features the UR3e, a parallel gripper, and force/torque sensing for detailed assessment of contact forces during handover across diverse objects.

ContactRL, shielded with eCBF, consistently achieved maximum contact forces below 10 N in all physical trials—well within established tactile comfort limits. Object type and grasping pose significantly modulated force profiles, but the framework’s explicit safety envelope rendered force variation consistently safe across trial diversity and participant heterogeneity.

Figure 7: Contact force distributions in physical handover tasks confirm that ContactRL with eCBF strictly maintains all contacts below the 10 N comfort threshold, with only minor variation due to object geometry and pose.

Theoretical and Practical Implications

ContactRL departs from the prevailing paradigm in safe RL—which typically equates safety with non-contact—by modeling, incentivizing, and guaranteeing safe intentional contact. Unlike compliance-based controllers (including variable impedance control), ContactRL with eCBF safety shield is robust to contact condition uncertainties, object geometry, and inter-participant variability, and does not require extensive gain-tuning or individualized reparameterization.

Theoretically, the eCBF shield closes the safety gap endemic to RL policies, functioning as a lightweight, model-agnostic runtime wrapper that guarantees adherence to provable safety bounds without compromising trajectory optimality. The framework demonstrates that high-dimensional, contact-rich collaboration tasks can be solved efficiently with continuous end-to-end RL if safety is deeply integrated in both reward function and deployment.

Practically, this framework admits direct translation to real-world use for handover, tool delivery, and potentially for other intimate assistive tasks such as dressing and feeding. While moderate sim-to-real transfer error persists (centimeter-scale final pose accuracy), these are primarily attributable to limitations in physical modeling and the accuracy-safety trade-off inherent in CBF-based control. Further improvements are anticipated through higher-fidelity simulation and adaptive domain randomization.

Future Directions

Enhanced simulation fidelity and adaptive domain randomization to further minimize sim-to-real transfer error.
Extension to multi-contact or dynamic force tasks beyond the static handover paradigm.
Incorporation of additional metrics for safety and comfort, such as impulse, contact area, and subjective comfort reporting.
Application to other collaborative domains requiring soft but reliable intentional contact (personalized assistive care, social HRI tasks).

Conclusion

ContactRL demonstrates a rigorous and scalable approach to safe RL in contact-rich human-robot collaboration. By internalizing contact force feedback within the RL loop and decoupling safety via runtime kinetic energy-based CBF shielding, the framework achieves efficient, robust, and certifiably safe performance for collaborative object handover. The results compellingly support the viability of RL-based controllers for deployment in real-world contact-intensive settings, bridging the gap between policy optimality and safety-critical compliance.