Papers
Topics
Authors
Recent
Search
2000 character limit reached

Almost sure convergence of stochastic Hamiltonian descent methods

Published 24 Jun 2024 in math.OC | (2406.16649v3)

Abstract: Gradient normalization and soft clipping are two popular techniques for tackling instability issues and improving convergence of stochastic gradient descent (SGD) with momentum. In this article, we study these types of methods through the lens of dissipative Hamiltonian systems. Gradient normalization and certain types of soft clipping algorithms can be seen as (stochastic) implicit-explicit Euler discretizations of dissipative Hamiltonian systems, where the kinetic energy function determines the type of clipping that is applied. We make use of dynamical systems theory to show in a unified way that all of these schemes converge to stationary points of the objective function, almost surely, in several different settings: a) for $L$-smooth objective functions, when the variance of the stochastic gradients is possibly infinite, b) under the $(L_0,L_1)$-smoothness assumption, for heavy-tailed noise with bounded variance, and c) for $(L_0,L_1)$-smooth functions in the empirical risk minimization setting, when the variance is possibly infinite but the expectation is finite.

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.