Principled A Priori Criteria for Dataset Size Selection in Long-CoT Supervised Fine-Tuning
Develop principled a priori criteria for selecting the dataset size used in supervised fine-tuning on long chain-of-thought demonstrations for pretrained language models, taking into account dependence on model and data properties, so that practitioners can decide how many unique samples to use versus how many epochs to train before running experiments.
References
While training token accuracy provides a practical stopping signal for epoch scaling, the optimal dataset size is data- and model-dependent, and principled criteria for selecting it a priori remain elusive.
— Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning
(2602.11149 - Kopiczko et al., 11 Feb 2026) in Conclusion