Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation

Published 31 Mar 2025 in cs.RO, cs.AI, and cs.LG | (2503.24361v2)

Abstract: Large real-world robot datasets hold great potential to train generalist robot models, but scaling real-world human data collection is time-consuming and resource-intensive. Simulation has great potential in supplementing large-scale data, especially with recent advances in generative AI and automated data generation tools that enable scalable creation of robot behavior datasets. However, training a policy solely in simulation and transferring it to the real world often demands substantial human effort to bridge the reality gap. A compelling alternative is to co-train the policy on a mixture of simulation and real-world datasets. Preliminary studies have recently shown this strategy to substantially improve the performance of a policy over one trained on a limited amount of real-world data. Nonetheless, the community lacks a systematic understanding of sim-and-real co-training and what it takes to reap the benefits of simulation data for real-robot learning. This work presents a simple yet effective recipe for utilizing simulation data to solve vision-based robotic manipulation tasks. We derive this recipe from comprehensive experiments that validate the co-training strategy on various simulation and real-world datasets. Using two domains--a robot arm and a humanoid--across diverse tasks, we demonstrate that simulation data can enhance real-world task performance by an average of 38%, even with notable differences between the simulation and real-world data. Videos and additional results can be found at https://co-training.github.io/

Abstract PDF Upgrade to Chat

Summary

The paper introduces a co-training strategy that merges simulation and real-world data to significantly enhance robotic manipulation performance.
It examines the integration of task-aware and task-agnostic simulation data, emphasizing the importance of co-training ratios and data alignment.
Co-training using simulation data achieved a 38% improvement in task success, demonstrating enhanced policy generalization and robustness.

Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation

This paper introduces a systematic approach to improve vision-based robotic manipulation by leveraging a combination of simulation and real-world data. The concept of sim-and-real co-training is explored as a means to enhance policy performance in real-world tasks, addressing the data collection challenges in robotics. They propose a framework that systematically examines the integration of simulation data into real-world robotic tasks to improve generalization and task performance.

Problem Formulation and Approach

The authors detail a co-training strategy that leverages both real-world robotic demonstration data and synthetic data from simulation environments. The simulation environments are categorized into task-aware digital cousins and task-agnostic prior datasets, each contributing to the learning process in distinct ways.

Key insights include identifying how different dataset factors impact training, such as task composition, object variations, and camera alignments. Their approach emphasizes co-training with diverse data compositions and a co-training ratio, which is critical in balancing the influence of simulation and real-world data during training.

Figure 1: Real-World and Simulation Tasks. Displays three data sources in Kitchen Panda and Humanoid Tabletop domains for real-world tasks, task-aware digital cousin environments, and prior multi-task data from simulations.

Experimental Setup

The study employs two robotic domains: a Panda manipulator in a kitchen environment and a GR-1 humanoid robot in a tabletop setting. Various tasks are defined for each, covering a range of manipulation actions from pick-and-place to complex bimanual tasks.

Co-training experiments demonstrate improvements in policy performance when blending real and simulated data, even with significant differences between these data sources. Effectiveness is measured in terms of task success rates, enhanced generalization to unseen scenarios, and improved robustness against positional and object variances.

Figure 2: Effect of the different co-training ratios. Highlights the importance of the co-training ratio on policy performance in CupPnP task with 20 real demos and 1,000 simulation demos.

Results and Analysis

The co-training approach achieved an average task performance improvement of 38% compared to training solely on real-world data. Task-aware digital cousins provided noticeable gains by offering semantically similar data, while task-agnostic data also contributed when aligned properly, particularly with automatic camera pose adjustments.

Key factors influencing co-training success include the amount of simulation data, the co-training ratio, and visual alignment between simulation and real-world environments. The study underscores the role of diverse simulation data in amplifying generalization capacities of learned policies, offering strategic guidelines for data composition.

Figure 3: Examples of the Video2Video model outputs with different noise strengths. Shows how video diffusion models can enhance visual realism by simulating different noise levels.

Limitations and Future Work

The research acknowledges its primary focus on pick-and-place manipulation tasks and suggests future exploration in simulating complex dynamics like deformation and fluid interaction. The paper also hints at leveraging advanced generative AI models for creating more realistic simulation environments. Further studies could explore broader domains and task types to solidify the generalizability of the proposed co-training recipe.

Conclusion

In summary, this paper presents a comprehensive methodology for utilizing simulation data to augment real-world robotic task training effectively. The sim-and-real co-training strategy offers a viable pathway for overcoming data collection limitations, improving task performance, and enabling robust policy learning in robotics through strategic use of simulation datasets. The insights provided offer valuable guidelines for practitioners aiming to blend real and synthetic data in robot learning programs, setting the stage for future innovations in robotic autonomy and manipulation.