Generating diverse, interactive, and realistic simulation scenarios for robot learning

Develop simulation data generation methods that can produce diverse, interactive, and realistic scenarios for robot manipulation training to mitigate the sim-to-real gap.

Background

Within the data pyramid for robotics, simulation data offers scalability and low cost, but suffers from a notable sim-to-real gap. A key bottleneck is the inability of current simulators and generation pipelines to reliably create sufficiently diverse, interactive, and realistic scenarios that faithfully capture real-world complexities.

Solving this problem is essential for leveraging simulation at scale to improve generalization and reduce reliance on expensive real-world data collection for training manipulation policies.

References

Occupying the middle tier is simulation data (Wang et al., 2023; Li et al., 2023; Mu et al., 2024; Chen et al., 2025b); it is inexpensive and scalable but plagued by a significant sim-to-real gap, and the challenge of generating diverse, interactive, and realistic scenarios remains an open problem (Nasiriany et al., 2024; Ren et al., 2024; Zhang et al., 2025).

RDT2: Exploring the Scaling Limit of UMI Data Towards Zero-Shot Cross-Embodiment Generalization  (2602.03310 - Liu et al., 3 Feb 2026) in Section 2, Related Work (Data Pyramid for Robotics)