Pre-training fraction needed to prevent catastrophic forgetting during finetuning
Determine the fraction of pre-training data that must be included in the fine-tuning (mid-training) mix to sufficiently prevent catastrophic forgetting of pre-trained capabilities in Large Language Models while enabling adaptation to specialized tasks.
References
However, it remains unclear what fraction of pre-training data is sufficient to effectively prevent catastrophic forgetting.
— A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMs
(2601.16979 - Kalra et al., 23 Jan 2026) in Section 4 (How much Pre-training data is needed to avoid Catastrophic forgetting?), first paragraph