Papers
Topics
Authors
Recent
Search
2000 character limit reached

An Adaptive Tangent Feature Perspective of Neural Networks

Published 29 Aug 2023 in cs.LG and cs.CV | (2308.15478v3)

Abstract: In order to better understand feature learning in neural networks, we propose a framework for understanding linear models in tangent feature space where the features are allowed to be transformed during training. We consider linear transformations of features, resulting in a joint optimization over parameters and transformations with a bilinear interpolation constraint. We show that this optimization problem has an equivalent linearly constrained optimization with structured regularization that encourages approximately low rank solutions. Specializing to neural network structure, we gain insights into how the features and thus the kernel function change, providing additional nuance to the phenomenon of kernel alignment when the target function is poorly represented using tangent features. We verify our theoretical observations in the kernel alignment of real neural networks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  2. Deep learning. Nature, 521(7553):436–444, 2015.
  3. Understanding deep learning requires rethinking generalization. In International Conference on Learning Representations, 2017.
  4. The global landscape of neural networks: An overview. IEEE Signal Processing Magazine, 37(5):95–108, 2020.
  5. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, volume 25, 2012.
  6. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems, volume 27, 2014.
  7. Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv:2303.12712, 2023.
  8. Richard Sutton. The bitter lesson. Incomplete Ideas (blog), 13(1), 2019. URL http://www.incompleteideas.net/IncIdeas/BitterLesson.html.
  9. To understand deep learning we need to understand kernel learning. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 541–549, 2018.
  10. Neural tangent kernel: Convergence and generalization in neural networks. In Advances in Neural Information Processing Systems, volume 31, 2018.
  11. Wide neural networks of any depth evolve as linear models under gradient descent. Journal of Statistical Mechanics: Theory and Experiment, 2020(12):124002, 2020.
  12. Grace Wahba. Spline Models for Observational Data. Society for Industrial and Applied Mathematics, 1990.
  13. Just interpolate: Kernel “ridgeless” regression can generalize. The Annals of Statistics, 48(3):1329 – 1347, 2020.
  14. Denny Wu and Ji Xu. On the optimal weighted ℓ2subscriptℓ2\ell_{2}roman_ℓ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT regularization in overparameterized linear regression. In Advances in Neural Information Processing Systems, volume 33, pages 10112–10123, 2020.
  15. Empirical limitations of the NTK for understanding scaling laws in deep learning. Transactions on Machine Learning Research, 2023.
  16. On lazy training in differentiable programming. In Advances in Neural Information Processing Systems, volume 32, 2019.
  17. Analyzing finite neural networks: Can we trust neural tangent kernel theory? In Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, volume 145, pages 868–895, 2022a.
  18. Implicit regularization via neural feature alignment. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130, pages 2269–2277, 2021.
  19. Neural networks as kernel learners: The silent alignment effect. In International Conference on Learning Representations, 2022.
  20. Neural tangent kernel beyond the infinite-width limit: Effects of depth and initialization. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pages 19522–19560, 2022b.
  21. Neural (tangent kernel) collapse. arXiv:2305.16427, 2023.
  22. The recurrent neural tangent kernel. In International Conference on Learning Representations, 2021.
  23. Feature learning in neural networks and kernel machines that recursively learn features. arXiv:2212.13881, 2022.
  24. Mechanism of feature learning in convolutional neural networks. arXiv:2309.00570, 2023.
  25. Choosing multiple parameters for support vector machines. Machine Learning, 46(1):131–159, 2002.
  26. Multiple kernel learning algorithms. Journal of Machine Learning Research, 12(64):2211–2268, 2011.
  27. Radford M Neal. Bayesian learning for neural networks, volume 118. Springer Science & Business Media, 2012.
  28. A new view of automatic relevance determination. In Advances in Neural Information Processing Systems, volume 20, 2007.
  29. Maximum-margin matrix factorization. In Advances in Neural Information Processing Systems, volume 17, 2004.
  30. The flip side of the reweighted coin: Duality of adaptive dropout and regularization. In Advances in Neural Information Processing Systems, volume 34, pages 23401–23412, 2021.
  31. Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48):30063–30070, 2020.
  32. Surprises in high-dimensional ridgeless least squares interpolation. The Annals of Statistics, 50(2):949 – 986, 2022.
  33. A farewell to the bias-variance tradeoff? An overview of the theory of overparameterized machine learning. arXiv:2109.02355, 2021.
  34. Fast rates for noisy interpolation require rethinking the effect of inductive bias. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pages 5397–5428, 2022.
  35. George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4):303–314, 1989.
  36. Greg Yang. Tensor programs II: Neural tangent kernel for any architecture. arXiv:2006.14548, 2020.
  37. Measuring the intrinsic dimension of objective landscapes. In International Conference on Learning Representations, 2018.
  38. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
  39. Automatic differentiation in PyTorch. 2017.
Citations (1)

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.