A Piecewise Lyapunov Analysis of Sub-quadratic SGD: Applications to Robust and Quantile Regression

Published 11 Apr 2025 in stat.ML, cs.LG, math.OC, math.PR, math.ST, and stat.TH | (2504.08178v3)

Abstract: Motivated by robust and quantile regression problems, we investigate the stochastic gradient descent (SGD) algorithm for minimizing an objective function $f$ that is locally strongly convex with a sub--quadratic tail. This setting covers many widely used online statistical methods. We introduce a novel piecewise Lyapunov function that enables us to handle functions $f$ with only first-order differentiability, which includes a wide range of popular loss functions such as Huber loss. Leveraging our proposed Lyapunov function, we derive finite-time moment bounds under general diminishing stepsizes, as well as constant stepsizes. We further establish the weak convergence, central limit theorem and bias characterization under constant stepsize, providing the first geometrical convergence result for sub--quadratic SGD. Our results have wide applications, especially in online statistical methods. In particular, we discuss two applications of our results. 1) Online robust regression: We consider a corrupted linear model with sub--exponential covariates and heavy--tailed noise. Our analysis provides convergence rates comparable to those for corrupted models with Gaussian covariates and noise. 2) Online quantile regression: Importantly, our results relax the common assumption in prior work that the conditional density is continuous and provide a more fine-grained analysis for the moment bounds.

Abstract PDF Upgrade to Chat

Summary

A Piecewise Lyapunov Analysis of sub-quadratic SGD: Applications to Robust and Quantile Regression

The paper investigates the stochastic gradient descent (SGD) algorithm for minimizing objective functions that are locally strongly convex but exhibit sub-quadratic tails, which are common in many modern statistical methods, including robust and quantile regression. The authors introduce a piecewise Lyapunov function to handle objective functions that are only first-order differentiable, covering a wide range of loss functions such as the Huber loss. Leveraging this novel Lyapunov function, they derive moment bounds and investigate the convergence properties of sub-quadratic SGD, providing insights into their applicability in various online statistical algorithms.

Key Contributions

Lyapunov Function for sub-Quadratic SGD: The authors propose a novel piecewise Lyapunov function that effectively captures the dynamics of sub-quadratic SGD without requiring twice differentiability of the objective function. This function allows for detailed analysis of functions with only first-order differentiability, significantly extending the scope of prior work.
Finite-time Moment Bounds: The paper provides finite-time moment bounds for sub-quadratic SGD under both constant and diminishing stepsizes. Unlike previous studies, which imposed restrictive assumptions on noise and required the objective functions to be twice differentiable, this work relaxes these conditions, allowing for broader applicability.
Weak Convergence and Bias Characterization: For constant stepsize SGD, the authors establish weak convergence to a limiting random variable and provide bounds on the asymptotic bias, revealing that it is proportional to the stepsize. These results demonstrate the potential for using constant stepsizes for fast convergence while achieving a higher order of bias reduction through techniques like Richardson-Romberg extrapolation.
Applications to Robust and Quantile Regression: The study applies its theoretical findings to robust and quantile regression, providing non-asymptotic moment bounds and rates of convergence. For robust regression, the results are applicable to models with sub-exponential covariates and heavy-tailed errors, showcasing the method’s efficiency even in data with significant corruption and noise. For quantile regression, the analysis does not require the continuity of the conditional density, which broadens the applicability in real-world scenarios.

Implications and Future Directions

The theoretical developments in this paper have profound practical implications for online learning and statistical estimation methods in machine learning. They allow for the analysis and implementation of robust algorithms in environments with adverse conditions such as non-Gaussian noise and data contamination, thereby broadening the application potential of SGD in real-world situations.

The extension of sub-quadratic SGD results with piecewise Lyapunov functions opens up new avenues for research. A promising direction lies in extending these techniques to other non-convex optimization settings like sub-linear SGD, which may exhibit similar analytical challenges. Additionally, exploring the possibility of weak convergence for non-smooth optimization problems like quantile regression could inform the development of robust algorithms with improved theoretical guarantees.

Conclusion

By proposing a novel piecewise Lyapunov framework, the authors extend the theoretical understanding of sub-quadratic SGD. Their work provides a rigorous mathematical foundation and practical computational techniques for addressing complex challenges in robust and quantile regression. This paper not only enhances algorithmic efficiency and robustness but also significantly contributes to the broader field of optimization in statistical machine learning.