Explain why Sequential Fine-Tuning surpasses multitask training on zero-shot generalization

Determine the underlying mechanism by which Sequential Fine-Tuning of large Vision-Language-Action models in continual reinforcement learning settings often achieves better zero-shot generalization on held-out tasks than joint multitask training, and rigorously test the hypothesis that task sequencing induces an implicit regularization effect that improves generalization.

Background

Across five continual reinforcement learning benchmarks and multiple Vision-Language-Action models, the authors observe that parameter-efficient Sequential Fine-Tuning with LoRA and on-policy RL frequently preserves and even enhances zero-shot generalization, sometimes outperforming a multitask oracle trained jointly on all tasks.

This outcome contrasts with conventional expectations that sequential training risks overfitting to the task sequence or forgetting previously learned capabilities. The paper hypothesizes that task sequencing may act as an implicit regularizer, exposing the model to a non-stationary objective that encourages more robust representations, but the root cause remains undetermined.

References

What is more intriguing is that Sequential Fine-Tuning often maintains a slight edge over oracle multi-task training on the generalization capabilities. We do not yet have a definitive explanation for this phenomenon.

Simple Recipe Works: Vision-Language-Action Models are Natural Continual Learners with Reinforcement Learning  (2603.11653 - Hu et al., 12 Mar 2026) in Section 6.3 (Why Good Zero-shot Generalization?)