Long-Horizon Video Generation for Robotics
Determine training and inference techniques that enable diffusion and flow-matching video generation models used as embodied world models in robotics to produce minutes-long videos with sustained temporal coherence and physical consistency, avoiding artifacts introduced by stitching multiple short clips and overcoming current limits of only a few seconds of generation.
References
While SOTA video models excel in short-duration video generation tasks, scaling these models to longer horizons for robotics tasks remains an open challenge.
— Video Generation Models in Robotics -- Applications, Research Challenges, Future Directions
(2601.07823 - Mei et al., 12 Jan 2026) in Subsection 7.8 (Long Video Generation)