Efficient Collection of Large-Scale Robotic Data

Determine practical, scalable methodologies to efficiently collect large-scale, high-quality robotic datasets for embodied AI, overcoming the time-consuming and labor-intensive nature of current data collection and addressing distribution shifts in simulator-to-real transfer and human–robot morphology mismatches when leveraging simulators or human activity videos.

Background

The survey underscores that the breakthroughs of foundation models in vision and language are closely tied to massive datasets, whereas robotic datasets remain comparatively small and expensive to collect. For instance, RT-1 required 17 months to collect 130k episodes, highlighting the significant resource demands of current practices.

Two potential directions are discussed: scaling data collection in high-fidelity simulators followed by Sim2Real transfer, and utilizing large-scale human activity datasets. However, persistent challenges remain, including distribution shifts between simulation and reality and morphological differences between humans and robots, which complicate direct transfer of knowledge and motivate the need for more efficient data collection methods.

References

How to collect robotic data more efficiently remains a key and open question.

A Survey on Robotics with Foundation Models: toward Embodied AI  (2402.02385 - Xu et al., 2024) in Section 5.3 Efficient Data Collection