Relaxing the bounded-risk assumption in the SGD analysis

Establish convergence guarantees for stochastic gradient descent with polynomially decaying learning rate η_t = η/t^γ and tail averaging in overparameterized linear regression with Gaussian covariates under assumptions weaker than Assumption 1 (the bounded-risk condition R(θ_t) ≤ cσ^2 for all t), and identify minimal sufficient conditions under which the excess-risk rates derived in the paper continue to hold.

Background

The main theoretical result analyzes SGD with a polynomially decaying step size and tail averaging under a bounded-risk assumption (Assumption 1) that the risk at any time t is at most a constant factor above the noise level. This assumption simplifies the analysis used to prove the excess-risk rates for the proposed anytime schedule.

The authors express that this assumption may be stronger than necessary and explicitly defer its relaxation to future work, indicating an unresolved question about what weaker or more verifiable conditions would suffice to recover the same rates.

References

Assumption~\ref{ass:bounded_risk} is a mild assumption, since in general we expect a scheduler to only start decaying the learning rate once the risk is variance dominated. We believe assumption~\ref{ass:bounded_risk} can be relaxed but it provides mathematical convenience in the proof of Theorem~\ref{thm:tgamma_rate}, and thus we leave the relaxation to future work.

Anytime Pretraining: Horizon-Free Learning-Rate Schedules with Weight Averaging  (2602.03702 - Meterez et al., 3 Feb 2026) in Section 4, Theoretical Analysis, Main results (discussion following Assumption 1)