Explain why Sequential Fine-Tuning surpasses multitask training on zero-shot generalization
Determine the underlying mechanism by which Sequential Fine-Tuning of large Vision-Language-Action models in continual reinforcement learning settings often achieves better zero-shot generalization on held-out tasks than joint multitask training, and rigorously test the hypothesis that task sequencing induces an implicit regularization effect that improves generalization.
References
What is more intriguing is that Sequential Fine-Tuning often maintains a slight edge over oracle multi-task training on the generalization capabilities. We do not yet have a definitive explanation for this phenomenon.
— Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning
(2603.11653 - Hu et al., 12 Mar 2026) in Section 6.3 (Why Good Zero-shot Generalization?)