Lower bound on sample size for effective LIMO fine-tuning
Determine the minimum number of high-quality supervised fine-tuning samples required to maintain effective mathematical reasoning performance when fine-tuning Qwen2.5-32B-Instruct using the LIMO dataset and training recipe, as assessed on benchmarks such as AIME24 and MATH500.
References
Our experiments reveal that a surprisingly small number (i.e. 800) of samples can elicit competition-level mathematical reasoning, though the lower bound for maintaining effective performance remains an open question.
— LIMO: Less is More for Reasoning
(2502.03387 - Ye et al., 5 Feb 2025) in Subsubsection RQ5: Sample Efficiency (Section Experiment → Analysis)