Papers
Topics
Authors
Recent
Search
2000 character limit reached

On the Convergence of AdaBound and its Connection to SGD

Published 13 Aug 2019 in cs.LG, math.OC, and stat.ML | (1908.04457v2)

Abstract: Adaptive gradient methods such as Adam have gained extreme popularity due to their success in training complex neural networks and less sensitivity to hyperparameter tuning compared to SGD. However, it has been recently shown that Adam can fail to converge and might cause poor generalization -- this lead to the design of new, sophisticated adaptive methods which attempt to generalize well while being theoretically reliable. In this technical report we focus on AdaBound, a promising, recently proposed optimizer. We present a stochastic convex problem for which AdaBound can provably take arbitrarily long to converge in terms of a factor which is not accounted for in the convergence rate guarantee of Luo et al. (2019). We present a new $O(\sqrt T)$ regret guarantee under different assumptions on the bound functions, and provide empirical results on CIFAR suggesting that a specific form of momentum SGD can match AdaBound's performance while having less hyperparameters and lower computational costs.

Citations (18)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.