A rationale from frequency perspective for grokking in training neural network
Abstract: Grokking is the phenomenon where neural networks NNs initially fit the training data and later generalize to the test data during training. In this paper, we empirically provide a frequency perspective to explain the emergence of this phenomenon in NNs. The core insight is that the networks initially learn the less salient frequency components present in the test data. We observe this phenomenon across both synthetic and real datasets, offering a novel viewpoint for elucidating the grokking phenomenon by characterizing it through the lens of frequency dynamics during the training process. Our empirical frequency-based analysis sheds new light on understanding the grokking phenomenon and its underlying mechanisms.
- L. Breiman, Reflections after refereeing papers for nips, The Mathematics of Generalization XX (1995) 11–15.
- Understanding deep learning requires rethinking generalization, in: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, OpenReview.net, 2017.
- Grokking: Generalization beyond overfitting on small algorithmic datasets, arXiv preprint arXiv:2201.02177 (2022).
- Towards understanding grokking: An effective theory of representation learning, Advances in Neural Information Processing Systems 35 (2022) 34651–34663.
- A toy model of universality: Reverse engineering how networks learn group operations, in: International Conference on Machine Learning, PMLR, 2023, pp. 6243–6267.
- Grokking as the transition from lazy to rich training dynamics, in: The Twelfth International Conference on Learning Representations, 2024. URL: https://openreview.net/forum?id=vt5mnLVIVo.
- Hidden progress in deep learning: Sgd learns parities near the computational limit, Advances in Neural Information Processing Systems 35 (2022) 21750–21764.
- Simplicity bias in transformers and their ability to learn sparse boolean functions, in: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 5767–5791.
- Benign overfitting and grokking in relu networks for xor cluster data, in: The Twelfth International Conference on Learning Representations, 2023.
- Attention is all you need, Advances in neural information processing systems 30 (2017).
- Progress measures for grokking via mechanistic interpretability, in: The Eleventh International Conference on Learning Representations, 2022.
- Interpreting grokked transformers in complex modular arithmetic, arXiv preprint arXiv:2402.16726 (2024).
- Omnigrok: Grokking beyond algorithmic data, in: The Eleventh International Conference on Learning Representations, 2022.
- Dichotomy of early and late phase implicit biases can provably induce grokking, in: The Twelfth International Conference on Learning Representations, 2023.
- The slingshot mechanism: An empirical study of adaptive optimizers and the grokking phenomenon, arXiv preprint arXiv:2206.04817 (2022).
- Explaining grokking through circuit efficiency, arXiv preprint arXiv:2309.02390 (2023).
- A tale of two circuits: Grokking as competition of sparse and dense subnetworks, in: ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models, 2023.
- Training behavior of deep neural network in frequency domain, in: Neural Information Processing: 26th International Conference, ICONIP 2019, Sydney, NSW, Australia, December 12–15, 2019, Proceedings, Part I 26, Springer, 2019, pp. 264–274.
- Frequency principle: Fourier analysis sheds light on deep neural networks, Communications in Computational Physics 28 (2020) 1746–1767.
- Overview frequency principle/spectral bias in deep learning, arXiv preprint arXiv:2201.07395 (2022).
- On the spectral bias of neural networks, in: International conference on machine learning, PMLR, 2019, pp. 5301–5310.
- Neural tangent kernel: Convergence and generalization in neural networks, Advances in neural information processing systems 31 (2018).
- On the exact computation of linear frequency principle dynamics and its generalization, SIAM Journal on Mathematics of Data Science 4 (2022) 1272–1292.
- Explicitizing an implicit bias of the frequency principle in two-layer neural networks, arXiv preprint arXiv:1905.10264 (2019).
- A linear frequency principle model to understand the absence of overfitting in neural networks, Chinese Physics Letters 38 (2021) 038701.
- Towards understanding the spectral bias of deep learning, in: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence Organization, 2021.
- G. Yang, H. Salman, A fine-grained spectral perspective on neural networks, arXiv preprint arXiv:1907.10599 (2019).
- The convergence rate of neural networks for learned functions of different frequencies, Advances in Neural Information Processing Systems 32 (2019).
- Spectrum dependent learning curves in kernel regression and wide neural networks, in: International Conference on Machine Learning, PMLR, 2020, pp. 1024–1034.
- Theory of the frequency principle for general deep neural networks, arXiv preprint arXiv:1906.09235 (2019).
- Machine learning from a continuous viewpoint, i, Science China Mathematics 63 (2020) 2233–2266.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.