Papers
Topics
Authors
Recent
Search
2000 character limit reached

Changing the Kernel During Training Leads to Double Descent in Kernel Regression

Published 3 Nov 2023 in stat.ML, cs.LG, math.OC, and stat.ME | (2311.01762v3)

Abstract: We investigate changing the bandwidth of a translational-invariant kernel during training when solving kernel regression with gradient descent. We present a theoretical bound on the out-of-sample generalization error that advocates for decreasing the bandwidth (and thus increasing the model complexity) during training. We further use the bound to show that kernel regression exhibits a double descent behavior when the model complexity is expressed as the minimum allowed bandwidth during training. Decreasing the bandwidth all the way to zero results in benign overfitting, and also circumvents the need for model selection. We demonstrate the double descent behavior on real and synthetic data and also demonstrate that kernel regression with a decreasing bandwidth outperforms that of a constant bandwidth, selected by cross-validation or marginal likelihood maximization. We finally apply our findings to neural networks, demonstrating that by modifying the neural tangent kernel (NTK) during training, making the NTK behave as if its bandwidth were decreasing to zero, we can make the network overfit more benignly, and converge in fewer iterations.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.