A Local Convergence Theory for the Stochastic Gradient Descent Method in Non-Convex Optimization With Non-isolated Local Minima

Published 21 Mar 2022 in cs.LG, cs.NA, math.NA, and stat.ML | (2203.10973v3)

Abstract: Loss functions with non-isolated minima have emerged in several machine learning problems, creating a gap between theory and practice. In this paper, we formulate a new type of local convexity condition that is suitable to describe the behavior of loss functions near non-isolated minima. We show that such condition is general enough to encompass many existing conditions. In addition we study the local convergence of the SGD under this mild condition by adopting the notion of stochastic stability. The corresponding concentration inequalities from the convergence analysis help to interpret the empirical observation from some practical training results.