Papers
Topics
Authors
Recent
Search
2000 character limit reached

Understanding overfitting peaks in generalization error: Analytical risk curves for $l_2$ and $l_1$ penalized interpolation

Published 9 Jun 2019 in cs.LG, physics.data-an, and stat.ML | (1906.03667v1)

Abstract: Traditionally in regression one minimizes the number of fitting parameters or uses smoothing/regularization to trade training (TE) and generalization error (GE). Driving TE to zero by increasing fitting degrees of freedom (dof) is expected to increase GE. However modern big-data approaches, including deep nets, seem to over-parametrize and send TE to zero (data interpolation) without impacting GE. Overparametrization has the benefit that global minima of the empirical loss function proliferate and become easier to find. These phenomena have drawn theoretical attention. Regression and classification algorithms have been shown that interpolate data but also generalize optimally. An interesting related phenomenon has been noted: the existence of non-monotonic risk curves, with a peak in GE with increasing dof. It was suggested that this peak separates a classical regime from a modern regime where over-parametrization improves performance. Similar over-fitting peaks were reported previously (statistical physics approach to learning) and attributed to increased fitting model flexibility. We introduce a generative and fitting model pair ("Misparametrized Sparse Regression" or MiSpaR) and show that the overfitting peak can be dissociated from the point at which the fitting function gains enough dof's to match the data generative model and thus provides good generalization. This complicates the interpretation of overfitting peaks as separating a "classical" from a "modern" regime. Data interpolation itself cannot guarantee good generalization: we need to study the interpolation with different penalty terms. We present analytical formulae for GE curves for MiSpaR with $l_2$ and $l_1$ penalties, in the interpolating limit $\lambda\rightarrow 0$.These risk curves exhibit important differences and help elucidate the underlying phenomena.

Citations (50)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.