Papers
Topics
Authors
Recent
Search
2000 character limit reached

Grams: Gradient Descent with Adaptive Momentum Scaling

Published 22 Dec 2024 in cs.LG, cs.AI, cs.DS, and math.OC | (2412.17107v3)

Abstract: We introduce $\mathbf{G}$radient Descent with $\mathbf{A}$daptive $\mathbf{M}$omentum $\mathbf{S}$caling ($\mathbf{Grams}$), a novel optimization algorithm that decouples the direction and magnitude of parameter updates in deep learning. Unlike traditional optimizers that directly integrate momentum into updates, Grams separates the update direction, derived from current gradients, from momentum, which is used solely for adaptive magnitude scaling. This approach enables Grams to achieve improved loss descent compared to state-of-the-art cautious and momentum-based optimizers. We theoretically demonstrate that Grams descents faster than other state-of-the-art optimizers and establish a global convergence guarantee for Grams. We also validate its effectiveness through extensive empirical evaluations. The results demonstrate Grams' superior performance, including faster convergence and better generalization, compared to widely-used optimizers such as Adam, Lion, and their cautious variants. Our results highlight Grams' potential as a transformative approach for efficiently training and fine-tuning LLMs. Code is available at https://github.com/Gunale0926/Grams.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.