Addressing depth generalization limits in latent recurrent VLA models
Develop architectural innovations or training protocols for Recurrent-Depth VLA (RD-VLA) and related latent iterative reasoning-based visuomotor policies that prevent state saturation and performance degradation when recurrence depth exceeds the empirically optimal range, thereby enabling reliable scaling of test-time compute in robotics.
References
A key limitation observed in our experiments is the boundary of depth generalization. While performance scales predictably with the number of recurrent steps up to some optimal number of iterations, extending recurrence beyond this number of iterations may lead to state saturation or performance degradation rather than continued refinement. Addressing this problem—perhaps through architectural innovations or specific training protocols remains an open challenge for scaling latent reasoning in robotics.