Derive scaling laws for alignment pretraining
Determine the precise scaling behavior of alignment pretraining interventions as a function of model size, data quantity, and training compute, including whether small fixed data mixtures can reliably influence alignment priors at scale and how effects interact with increased post-training FLOPS.
References
Although evidence from suggests that the effect of pretraining priors increases with model size and data quantity, the precise scaling behaviour of safety interventions at pretraining remains unknown.
— Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment
(2601.10160 - Tice et al., 15 Jan 2026) in Section 7, Future Work – Scaling Laws for Alignment Pretraining