Learning Multi-Arm Manipulation Through Collaborative Teleoperation

Published 12 Dec 2020 in cs.RO, cs.AI, and cs.LG | (2012.06738v1)

Abstract: Imitation Learning (IL) is a powerful paradigm to teach robots to perform manipulation tasks by allowing them to learn from human demonstrations collected via teleoperation, but has mostly been limited to single-arm manipulation. However, many real-world tasks require multiple arms, such as lifting a heavy object or assembling a desk. Unfortunately, applying IL to multi-arm manipulation tasks has been challenging -- asking a human to control more than one robotic arm can impose significant cognitive burden and is often only possible for a maximum of two robot arms. To address these challenges, we present Multi-Arm RoboTurk (MART), a multi-user data collection platform that allows multiple remote users to simultaneously teleoperate a set of robotic arms and collect demonstrations for multi-arm tasks. Using MART, we collected demonstrations for five novel two and three-arm tasks from several geographically separated users. From our data we arrived at a critical insight: most multi-arm tasks do not require global coordination throughout its full duration, but only during specific moments. We show that learning from such data consequently presents challenges for centralized agents that directly attempt to model all robot actions simultaneously, and perform a comprehensive study of different policy architectures with varying levels of centralization on our tasks. Finally, we propose and evaluate a base-residual policy framework that allows trained policies to better adapt to the mixed coordination setting common in multi-arm manipulation, and show that a centralized policy augmented with a decentralized residual model outperforms all other models on our set of benchmark tasks. Additional results and videos at https://roboturk.stanford.edu/multiarm .

Abstract PDF Upgrade to Chat

Citations (42)

View on Semantic Scholar

Summary

The paper introduces MART, a platform enabling multiple remote users to collaboratively teleoperate robotic arms for complex manipulation tasks.
It evaluates various policy architectures and demonstrates that combining centralized and decentralized strategies enhances coordination and robustness.
The base-residual policy framework, showcased in r-HBC and rd-HBC variants, outperforms traditional methods on five challenging multi-arm manipulation tasks.

Learning Multi-Arm Manipulation Through Collaborative Teleoperation

This paper introduces Multi-Arm RoboTurk (MART), a multi-user data collection platform designed to facilitate imitation learning (IL) for multi-arm manipulation tasks. The core innovation lies in enabling multiple remote users to simultaneously teleoperate a set of robotic arms, thereby overcoming the cognitive burden associated with single-operator multi-arm control. The authors demonstrate the effectiveness of MART by collecting demonstrations for five novel two-arm and three-arm tasks, and subsequently analyze various policy architectures with different levels of centralization to address the challenges posed by mixed coordination requirements in multi-arm manipulation. The paper concludes by proposing a base-residual policy framework that combines centralized and decentralized control strategies, achieving superior performance compared to purely centralized or decentralized approaches.

System Design and Implementation

The MART system builds upon the existing RoboTurk platform, extending its capabilities to support collaborative teleoperation. Key features of the system include:

Multi-User Teleoperation: Allows multiple remote users to simultaneously control individual robot arms via a smartphone interface and web browser, reducing cognitive load and expanding the pool of potential demonstrators (Figure 1).
Figure 1: Multi-Arm RoboTurk System Diagram illustrating how the system enables multiple remote users to collaboratively teleoperate robot arms and collect multi-arm task demonstrations.
Real-Time Synchronization: Employs WebRTC to establish low-latency communication channels and implements synchronization logic to ensure that all robot arms are actuated simultaneously, maintaining a consistent perception of real-time simulation for each user.
User-Specific Viewpoints: Renders tailored video streams from cameras positioned to provide optimal viewpoints for each operator, enhancing situational awareness and control precision.

Addressing Mixed Coordination Challenges

The paper identifies a critical challenge in multi-arm manipulation: many tasks do not require continuous global coordination but rather exhibit phases of independent operation interspersed with periods of tight synchronization (Figure 2).

Figure 2: Multi-Stage Multi-Arm Manipulation with Mixed Coordination, exemplified by table assembly where independent column assembly precedes coordinated tabletop alignment.

To address this, the authors conduct a comprehensive study of different policy architectures, including:

Centralized Agents: Utilize the entire state space to generate actions for all robots, enabling explicit coordination but potentially overfitting to spurious correlations.
Decentralized Agents: Generate robot-specific actions based solely on local observations, avoiding overfitting but struggling with tasks requiring synchronization.
Hierarchical Behavioral Cloning (HBC): Employs a high-level policy to predict subgoals and a low-level policy to execute actions, providing temporal abstraction and improved learning from offline demonstrations.

The study reveals that centralized agents perform poorly compared to distributed variants due to "hallucinating" incorrect correlations, while distributed agents struggle with synchronization.

Base-Residual Policy Framework

To overcome the limitations of purely centralized or decentralized approaches, the paper introduces a base-residual policy framework. This framework combines a base policy, which can be either centralized or decentralized, with a residual policy that learns to perturb the actions of the base policy. The guiding principle is that the base policy dictates the dominant behavior (coordinated or decoupled), while the residual policy encourages complementary traits.

The framework includes two variants:

r-HBC: A decentralized HBC base policy is augmented with a centralized residual network, improving coordination in tasks requiring synchronization.
rd-HBC: A centralized HBC base policy is augmented with a decentralized residual network, mitigating overfitting and encouraging generalization.

The residual network outputs a small correction to the action:

$a = \bar{a} + \delta, \quad \delta = \rho(\bar{a}, s), \quad ||\delta||_2 < \epsilon$

where $\bar{a}$ is the action from the pretrained policy, $\delta$ is the correction from the residual network $\rho$ , and $\epsilon$ is a small constant to prevent the residual network from dominating the overall policy behavior.

Experimental Evaluation

The effectiveness of the proposed framework is demonstrated through experiments on five novel multi-arm manipulation tasks in simulation: Multi-Cube Lifting, Drink Tray Lifting, Table Assembly, Pick-Place Handover, and Lifting Wiping (Figure 3).

Figure 3: Multi-Cube Lifting task showing two robot arms independently lifting blocks.

The tasks are designed to showcase real-world scenarios requiring varying levels of coordination between agents. The results show that the base-residual policy framework consistently outperforms purely centralized or decentralized baselines across all tasks, highlighting its ability to adapt to mixed coordination settings. Specifically, r-HBC excels in complex, multi-stage tasks, while rd-HBC performs best in shorter-horizon tasks. The framework also demonstrates robustness to varying demonstration quality, maintaining performance improvements even with noisy training data.

Conclusion

The paper presents a valuable contribution to the field of multi-arm manipulation by introducing a scalable data collection system and a novel policy framework that effectively addresses the challenges of mixed coordination. The MART system lowers the barrier to entry for exploring multi-arm tasks, while the base-residual policy framework offers a promising approach for learning complex manipulation skills from demonstration data. This work opens avenues for future research in areas such as improving performance on challenging tasks like assembly and exploring novel emergent properties underlying multi-arm manipulation.

Markdown Report Issue