Stability of long-horizon video generation in autoregressive diffusion models
Develop training and architectural techniques that improve the stability of long-horizon video generation in autoregressive diffusion-based models, including the distilled causal-attention version of StereoWorld, so that visual quality does not noticeably degrade as sequence length increases in both stereo and monocular settings.
References
Improving the stability of long-horizon video generation therefore remains an open challenge shared by both monocular and stereo video synthesis.
— Stereo World Model: Camera-Guided Stereo Video Generation
(2603.17375 - Sun et al., 18 Mar 2026) in Supplementary Material, Section "Long Video Distillation"