Relaxing the bounded-risk assumption in the SGD analysis
Establish convergence guarantees for stochastic gradient descent with polynomially decaying learning rate η_t = η/t^γ and tail averaging in overparameterized linear regression with Gaussian covariates under assumptions weaker than Assumption 1 (the bounded-risk condition R(θ_t) ≤ cσ^2 for all t), and identify minimal sufficient conditions under which the excess-risk rates derived in the paper continue to hold.
References
Assumption~\ref{ass:bounded_risk} is a mild assumption, since in general we expect a scheduler to only start decaying the learning rate once the risk is variance dominated. We believe assumption~\ref{ass:bounded_risk} can be relaxed but it provides mathematical convenience in the proof of Theorem~\ref{thm:tgamma_rate}, and thus we leave the relaxation to future work.