Papers
Topics
Authors
Recent
Search
2000 character limit reached

Training on the Edge of Stability Is Caused by Layerwise Jacobian Alignment

Published 31 May 2024 in stat.ML and cs.LG | (2406.00127v1)

Abstract: During neural network training, the sharpness of the Hessian matrix of the training loss rises until training is on the edge of stability. As a result, even nonstochastic gradient descent does not accurately model the underlying dynamical system defined by the gradient flow of the training loss. We use an exponential Euler solver to train the network without entering the edge of stability, so that we accurately approximate the true gradient descent dynamics. We demonstrate experimentally that the increase in the sharpness of the Hessian matrix is caused by the layerwise Jacobian matrices of the network becoming aligned, so that a small change in the network preactivations near the inputs of the network can cause a large change in the outputs of the network. We further demonstrate that the degree of alignment scales with the size of the dataset by a power law with a coefficient of determination between 0.74 and 0.98.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.