Papers
Topics
Authors
Recent
Search
2000 character limit reached

On the energy landscape of deep networks

Published 20 Nov 2015 in cs.LG | (1511.06485v5)

Abstract: We introduce "AnnealSGD", a regularized stochastic gradient descent algorithm motivated by an analysis of the energy landscape of a particular class of deep networks with sparse random weights. The loss function of such networks can be approximated by the Hamiltonian of a spherical spin glass with Gaussian coupling. While different from currently-popular architectures such as convolutional ones, spin glasses are amenable to analysis, which provides insights on the topology of the loss function and motivates algorithms to minimize it. Specifically, we show that a regularization term akin to a magnetic field can be modulated with a single scalar parameter to transition the loss function from a complex, non-convex landscape with exponentially many local minima, to a phase with a polynomial number of minima, all the way down to a trivial landscape with a unique minimum. AnnealSGD starts training in the relaxed polynomial regime and gradually tightens the regularization parameter to steer the energy towards the original exponential regime. Even for convolutional neural networks, which are quite unlike sparse random networks, we empirically show that AnnealSGD improves the generalization error using competitive baselines on MNIST and CIFAR-10.

Citations (27)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 12 likes about this paper.