Mechanism of Memorization-Driven Generalization in Long-CoT Supervised Fine-Tuning
Establish the causal mechanism underlying the repetition advantage in supervised fine-tuning of pretrained language models on long chain-of-thought demonstrations, specifically why achieving near-perfect training token accuracy through multi-epoch repetition coincides with improved downstream generalization on reasoning benchmarks without additional catastrophic forgetting.
References
We argue that explaining why memorization under repetition improves generalization in reasoning SFT is an important open problem.
— Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning
(2602.11149 - Kopiczko et al., 11 Feb 2026) in Conclusion