Principled Use of Offline Real-World Data with Simulated Rollouts

Characterize the effectiveness and governing principles of integrating offline real-world datasets with simulated rollouts from a learned world model when training vision–language–action robot policies, and determine the optimal mixing ratio between simulated imagination data and real-world experience to maximize performance while preventing catastrophic forgetting.

Background

RISE shifts on-policy learning into imagination using a learned world model but still relies on offline real-world data to anchor behavior, as purely online learning can lead to instability and forgetting. Empirical results show that performance depends on the balance between simulated rollouts and real data, yet the precise ratio is not established.

The authors explicitly note that understanding why and how offline data contributes to stability and generalization, and how to tune its proportion relative to simulated data, is currently unresolved and requires principled investigation.

References

However, the optimal ratio between simulated rollouts and real-world experience requires further parameter tuning. Understanding the effectiveness and principles of these offline data represents an open problem.

— RISE: Self-Improving Robot Policy with Compositional World Model (2602.11075 - Yang et al., 11 Feb 2026) in Section Limitations and future work — The Simulated–Real Data Balance

Principled Use of Offline Real-World Data with Simulated Rollouts

Background

References

Related Problems