Learning a Unified Policy for Position and Force Control in Legged Loco-Manipulation

Published 27 May 2025 in cs.RO | (2505.20829v2)

Abstract: Robotic loco-manipulation tasks often involve contact-rich interactions with the environment, requiring the joint modeling of contact force and robot position. However, recent visuomotor policies often focus solely on learning position or force control, overlooking their co-learning. In this work, we propose the first unified policy for legged robots that jointly models force and position control learned without reliance on force sensors. By simulating diverse combinations of position and force commands alongside external disturbance forces, we use reinforcement learning to learn a policy that estimates forces from historical robot states and compensates for them through position and velocity adjustments. This policy enables a wide range of manipulation behaviors under varying force and position inputs, including position tracking, force application, force tracking, and compliant interactions. Furthermore, we demonstrate that the learned policy enhances trajectory-based imitation learning pipelines by incorporating essential contact information through its force estimation module, achieving approximately 39.5% higher success rates across four challenging contact-rich manipulation tasks compared to position-control policies. Extensive experiments on both a quadrupedal manipulator and a humanoid robot validate the versatility and robustness of the proposed policy across diverse scenarios.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a unified reinforcement learning framework that simultaneously manages position and force commands without traditional force sensors.
The method employs an impedance control framework and PPO training in simulated environments, achieving accurate tracking and robust performance in contact-rich tasks.
The approach enhances legged robot loco-manipulation by enabling force-aware imitation learning, yielding improved compliance and reduced sim-to-real discrepancies.

Learning a Unified Policy for Position and Force Control in Legged Loco-Manipulation

Abstract and Introduction

The paper "Learning a Unified Policy for Position and Force Control in Legged Loco-Manipulation" (2505.20829) presents a novel approach for the control of legged robots engaged in loco-manipulation activities, specifically addressing the complex interplay between position and force necessary for contact-rich interactions. The proposed method leverages a reinforcement learning framework to develop a unified policy capable of simultaneously managing position and force commands without the reliance on traditional force sensors. This enables the robots to conduct diverse manipulation behaviors under varying environmental conditions, improving performance in contact-rich tasks.

Methodology

Unified Formulation for Force and Position Control

The paper introduces an impedance control framework that integrates both position and force command inputs to manage robot movements accurately. At the core, this model accounts for the net force acting on the robot and calculates the desired target position by considering both active and passive forces. The formulation facilitates adaptive behaviors such as position tracking, force application, and compliant responses to disturbances, all derived from a single policy, enhancing manipulation tasks across different input scenarios.

Figure 1: Method Overview. Architecture of the unified position-force policy trained via reinforcement learning to track position and force commands under external disturbances, highlighting position and velocity compensation.

Learning and Policy Design

The authors employed Proximal Policy Optimization (PPO) for training, facilitated by simulated environments using Isaac Gym for high-performance physics simulation. The learning process utilizes a structured observation space inclusive of robot state, commands, and historical actions, where a multi-layer perception (MLP) model estimates the robot's state and external forces, key to the implementation of the policy.

Figure 2: Diverse skills facilitated by our policy. Showcasing various control tasks enabled by the unified policy.

Experimental Setup

Force and Position Command Tracking

Performance evaluations in simulated environments demonstrate effective tracking accuracy for both position and force commands. Real-world tests further validate the policy, showcasing an average error within permissible limits across different tasks, implying robustness against sim-to-real discrepancy.

Figure 3: Force and position control evaluation. Evaluation of control tracking errors in both simulated settings and real-world scenarios.

Force-aware Imitation Learning

The study extends its impact by proposing a force-aware imitation learning pipeline that leverages the learned policy to gather contact-rich demonstrations. This process improves imitation learning models, yielding significantly higher success rates in tasks that blend vision and contact interaction.

Figure 4: Force-aware imitation learning. Performance comparison in contact-rich tasks and demonstration of the data collection process.

Results and Implications

The unified policy exhibits substantial improvements in contact-rich manipulation tasks, as evidenced by enhanced compliance and force application capabilities across a spectrum of task environments. The integration of force estimation into imitation learning further demonstrates the potential for more nuanced interactions in environments with restricted sensory modalities.

Given the diverse applicability and demonstrated performance across both simulated and real-world settings, the approach holds promise for advancing legged robotic systems. Future work may explore multi-point force estimation, cross-embodiment applications, and further reduction in sim-to-real gaps via data-driven adjustment methods.

Conclusion

The proposed unified force-position control policy represents a significant step towards more versatile and robust legged robotic systems capable of efficient contact-rich manipulation. By eliminating the reliance on direct force sensors and showcasing the adaptability of reinforcement learning approaches, this work paves the way for improved loco-manipulation tasks in increasingly complex real-world settings.

Markdown Report Issue

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Glossary

off on

Practical Applications

off on

Conceptual Simplification

off on

Explain it Like I'm 14

What is this paper about?

This paper is about teaching legged robots (like four-legged “dog” robots and humanoids) to both move and handle objects safely and smartly at the same time. The key idea is a single “brain” (a control policy) that can control where the robot moves its body or hand and how hard it pushes, without needing special force sensors. This lets the robot do tricky, contact-heavy tasks like wiping a board, opening cabinets, and pulling drawers.

What questions did the researchers ask?

The researchers focused on simple but important questions:

Can one policy control both position (where to move) and force (how hard to push) for a legged robot?
Can a robot learn to “feel” contact forces using only its own motion and joint histories, instead of physical force sensors?
Will this help robots perform contact-heavy tasks more reliably in the real world?
Can this “force-aware” control also make demonstration data better for imitation learning (teaching robots by example)?

How did they do it?

To make this easy to imagine, think of the robot like a person using their hand to press a sponge on a wall while walking. You need to decide where to move the hand (position) and how hard to press (force), and you must stay soft and safe if something pushes back (compliance).

Here’s their approach, explained in everyday terms:

The robot’s “brain” is a policy, which is just a set of rules that turn what the robot senses into actions (how to move its joints).
Instead of giving the robot a real sense of touch, they taught it to guess the push/pull forces by looking at its recent movement and how it responds to contact. This guessing module is called a force estimator.
They trained the policy in a fast simulator using reinforcement learning (like practicing a video game with trial-and-error):
- The robot is given different position goals (where to move its “hand”) and force goals (how hard to push).
- The environment sometimes “pushes back” with random disturbances, like someone nudging the robot or the tool getting stuck.
- The robot gets rewarded for following the position and force goals while staying stable and safe.
The control idea mimics a spring and shock absorber:
- Spring: if the robot’s hand is pushed off target, it gently pushes back toward the target.
- Shock absorber (damping): it smooths out the motion so it’s not jerky.
- This style is known as impedance control, and it makes the robot naturally compliant (safe and cushioned) in contact.
A single unified policy learns to:
- Track positions accurately when needed.
- Apply specific forces when needed.
- Blend both at the same time (hybrid control).
- Stay soft and safe when the environment changes or pushes back.

They trained and tested on two real robots:

A quadruped with an arm (Unitree B2-Z1).
A humanoid robot (Unitree G1).

They also used the learned policy to collect better training examples for imitation learning. During human teleoperation, the policy provides estimated contact forces along with images and robot states. This creates “force-aware” demonstrations that include both what to do and how hard to push.

What did they find?

Here are the main results and why they matter:

One policy can handle many behaviors:
- Position tracking: move the robot’s “hand” where it should be.
- Force control: press with a certain force.
- Force tracking: keep contact light or zero when needed.
- Impedance/compliance: be springy and safe when interacting with people or objects.
- Base compliance: adjust the robot’s walking speed or posture if something pushes its body.
Accuracy in tests:
- Position tracking errors stayed mostly within about 0.1 meters in simulation.
- Direct force control on real robots reached average errors within about 5–10 Newtons across tested positions, which is often good enough for everyday manipulation.
Better imitation learning:
- Adding the estimated contact force to demonstration data boosted success rates by about 39.5% across four real tasks:
- Wiping a blackboard (needs steady contact while moving).
- Opening and closing a push-to-open cabinet (needs just the right push).
- Opening a drawer even when the camera view is blocked (force tells the robot it made contact even when it can’t see).
- This shows that feeling-like information (even estimated) is crucial when vision alone is not enough.
Works on different robots:
- The same idea helped both a quadruped manipulator and a humanoid handle pushes and stay balanced or compliant.

Why does this matter?

Safer, smarter contact: Robots often need to push, pull, or press without breaking things or hurting themselves. A unified position-and-force controller makes that easier and more reliable.
No force sensors needed: Force sensors can be expensive or hard to mount. Estimating force from motion makes the system more practical and widely usable.
Better training data: Many robot datasets only record positions and images, missing the “feel” of contact. Adding estimated force makes training examples much more informative, especially for tasks where touch matters more than sight.
Generality: A single, learned policy that covers many behaviors reduces complexity and can transfer across different legged robots.

Limitations and future directions

Edge cases: Force estimation is less accurate during very fast interactions or at the edges of the robot’s reach.
Sim-to-real differences: Real hardware doesn’t perfectly match simulation, which can affect accuracy, especially along certain directions.
Single-point focus: The work mainly estimates force at one contact point. Future work could handle multiple contact points across the whole body (for example, bracing with the body while manipulating with the arm).

Bottom line

This paper shows a practical way to give legged robots a sense of “how hard to push” and “where to move” at the same time—using one learned policy and no force sensors. It improves real-world performance in contact-heavy tasks and produces better data for teaching robots by example, moving us closer to robots that can safely and usefully work with people in everyday environments.

Learning a Unified Policy for Position and Force Control in Legged Loco-Manipulation

Summary

Learning a Unified Policy for Position and Force Control in Legged Loco-Manipulation

Abstract and Introduction

Methodology

Unified Formulation for Force and Position Control

Learning and Policy Design

Experimental Setup

Force and Position Command Tracking

Force-aware Imitation Learning

Results and Implications

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What is this paper about?

What questions did the researchers ask?

How did they do it?

What did they find?

Why does this matter?

Limitations and future directions

Bottom line

Open Problems

Continue Learning

Authors (5)

Collections

Tweets

Learning a Unified Policy for Position and Force Control in Legged Loco-Manipulation

Summary

Learning a Unified Policy for Position and Force Control in Legged Loco-Manipulation

Abstract and Introduction

Methodology

Unified Formulation for Force and Position Control

Learning and Policy Design

Experimental Setup

Force and Position Command Tracking

Force-aware Imitation Learning

Results and Implications

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

What is this paper about?

What questions did the researchers ask?

How did they do it?

What did they find?

Why does this matter?

Limitations and future directions

Bottom line

Open Problems

Continue Learning

Related Papers

Authors (5)

Collections

Tweets