Sharpness evolution and its relationship to optimization and performance at LLM scale
Determine how loss landscape sharpness in Large Language Models evolves during large-scale training and ascertain its relationship to optimization behavior and downstream performance across tasks and data distributions.
References
As a result, most existing studies are restricted to small-scale experiments (typically ∼ 10M parameters), leaving open questions about how sharpness evolves in LLMs at scale, and how it relates to optimization and downstream performance.
— A Scalable Measure of Loss Landscape Curvature for Analyzing the Training Dynamics of LLMs
(2601.16979 - Kalra et al., 23 Jan 2026) in Section 1 (Introduction), final paragraph