Enhancing Accuracy in Deep Learning Using Random Matrix Theory
Abstract: We explore the applications of random matrix theory (RMT) in the training of deep neural networks (DNNs), focusing on layer pruning that is reducing the number of DNN parameters (weights). Our numerical results show that this pruning leads to a drastic reduction of parameters while not reducing the accuracy of DNNs and CNNs. Moreover, pruning the fully connected DNNs actually increases the accuracy and decreases the variance for random initializations. Our numerics indicate that this enhancement in accuracy is due to the simplification of the loss landscape. We next provide rigorous mathematical underpinning of these numerical results by proving the RMT-based Pruning Theorem. Our results offer valuable insights into the practical application of RMT for the creation of more efficient and accurate deep-learning models.
- Entrywise estimation of singular vectors of low-rank matrices with heteroskedasticity and dependence. IEEE Transactions on Information Theory, 68(7):4618–4650, 2022.
- Svd-based dnn pruning and retraining. Journal of Tsinghua University (Science and Technology), 56(7):772–776, 2016.
- Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4):433–459, 2010.
- Singular vector and singular subspace distribution for the matrix denoising model. 2021.
- The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics, 227(1):494–521, 2011.
- The singular values and vectors of low rank perturbations of large rectangular random matrices. Journal of Multivariate Analysis, 111:120–135, 2012.
- Stability for the training of deep neural networks and other classifiers. Mathematical Models and Methods in Applied Sciences, 31(11):2345–2390, 2021.
- Principal component analysis. Analytical methods, 6(9):2812–2831, 2014.
- Eigenvector distribution in the critical regime of bbp transition. Probability Theory and Related Fields, 182(1-2):399–479, 2022.
- Asymmetry helps: Eigenvalue and eigenvector analyses of asymmetrically perturbed low-rank matrices. Annals of statistics, 49(1):435, 2021.
- Random matrix methods for wireless communications. Cambridge University Press, 2011.
- The loss surfaces of multilayer networks. In Artificial Intelligence and Statistics, pages 192–204. PMLR, 2015.
- Fast learning of deep neural networks via singular value decomposition. In Pacific Rim International Conference on Artificial Intelligence, pages 820–826. Springer, 2014.
- Random Matrix Methods for Machine Learning. Cambridge University Press, 2022.
- The eigenvectors of single-spiked complex wishart matrices: Finite and asymptotic analyses. IEEE Transactions on Information Theory, 68(12):8092–8120, 2022.
- James W Demmel. Matrix computations (gene h. golub and charles f. van loan). SIAM Review, 28(2):252–255, 1986.
- Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. Advances in Neural Information Processing Systems, 27, 2014.
- Efficient and accurate estimation of lipschitz constants for deep neural networks. Advances in Neural Information Processing Systems, 32, 2019.
- Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010.
- Deep learning. MIT press, 2016.
- Escaping from saddle points—online stochastic gradient for tensor decomposition. In Conference on Learning Theory, pages 797–842. PMLR, 2015.
- Large-dimensional random matrix theory and its applications in deep learning and wireless communications. Random Matrices: Theory and Applications, 10(04):2230001, 2021.
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6):82–97, 2012.
- Flat minima. Neural computation, 9(1):1–42, 1997.
- Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
- Estimation of the number of spiked eigenvalues in a covariance matrix by bulk eigenvalue matching analysis. Journal of the American Statistical Association, pages 1–19, 2021.
- Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
- Limiting spectral distributions of sums of products of non-hermitian random matrices. arXiv preprint arXiv:1506.04436, 2015.
- Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Systems, 2, 1989.
- The large learning rate phase of deep learning: the catapult mechanism. arXiv preprint arXiv:2003.02218, 2020.
- William E Leeb. Matrix denoising for weighted loss functions and heterogeneous signals. SIAM Journal on Mathematics of Data Science, 3(3):987–1012, 2021.
- Traditional and heavy tailed self regularization in neural network models. In International Conference on Machine Learning, pages 4284–4293. PMLR, 2019.
- Heavy-tailed universality predicts trends in test accuracies for very large pre-trained deep neural networks. In Proceedings of the 2020 SIAM International Conference on Data Mining, pages 505–513. SIAM, 2020.
- Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning. The Journal of Machine Learning Research, 22(1):7479–7551, 2021.
- Distribution of eigenvalues for some sets of random matrices. Matematicheskii Sbornik, 114(4):507–536, 1967.
- Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data. Nature Communications, 12(1):4122, 2021.
- Impact of classification difficulty on the weight matrices spectra in deep learning and application to early-stopping. Journal of Machine Learning Research, 24:1–40, 2023.
- Bootstrap confidence sets for spectral projectors of sample covariance. Probability Theory and Related Fields, 174(3):1091–1132, 2019.
- Random perturbation of low rank matrices: Improving classical bounds. Linear Algebra and its Applications, 540:26–59, 2018.
- Matrices with gaussian noise: optimal estimates for singular subspace perturbation. arXiv e-prints, pages arXiv–1803, 2018.
- Leonid Pastur. On random matrices arising in deep neural networks. gaussian case. arXiv preprint arXiv:2001.06188, 2020.
- On the difficulty of training recurrent neural networks. In International Conference on Machine Learning, pages 1310–1318. Pmlr, 2013.
- Minimum norm interpolation by perceptra: Explicit regularization and implicit bias. NeurIPS, 2023.
- Lutz Prechelt. Early stopping—but when? Neural Networks: Tricks of the Trade: Second Edition, pages 53–67, 2012.
- On random matrices arising in deep neural networks: General iid case. Random Matrices: Theory and Applications, 12(01):2250046, 2023.
- Dense for the price of sparse: Improved performance of sparsely initialized networks via a subspace offset. In International Conference on Machine Learning, pages 8620–8629. PMLR, 2021.
- Markus Ringnér. What is principal component analysis? Nature Biotechnology, 26(3):303–304, 2008.
- Vadim Ivanovich Serdobolskii. Multivariate statistical analysis: A high-dimensional approach, volume 41. Springer Science & Business Media, 2000.
- Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958, 2014.
- Deep learning weight pruning with rmt-svd: Increasing accuracy and reducing overfitting. arXiv preprint arXiv:2303.08986, 2023.
- On the importance of initialization and momentum in deep learning. In International Conference on Machine Learning, pages 1139–1147. PMLR, 2013.
- On the initialisation of wide low-rank feedforward neural networks. arXiv preprint arXiv:2301.13710, 2023.
- Boundary between noise and information applied to filtering neural network weight matrices. arXiv preprint arXiv:2206.03927, 2022.
- Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems, 27, 2014.
- Random matrix analysis of deep neural network weight matrices. Physical Review E, 106(5):054124, 2022.
- Methods for pruning deep neural networks. IEEE Access, 10:63280–63300, 2022.
- Roman Vershynin. High-dimensional probability by roman vershynin, 2018.
- Restructuring of deep neural network acoustic models with singular value decomposition. In Interspeech, pages 2365–2369, 2013.
- Heavy-tailed regularization of weight matrices in deep neural networks. arXiv preprint arXiv:2304.02911, 2023.
- Trained rank pruning for efficient deep neural networks. In 2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing-NeurIPS Edition (EMC2-NIPS), pages 14–17. IEEE, 2019.
- Learning low-rank deep neural networks via singular vector orthogonality regularization and singular value sparsification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 678–679, 2020.
- Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3):107–115, 2021.
- Tracy-widom law for the extreme eigenvalues of large signal-plus-noise matrices. arXiv preprint arXiv:2009.12031, 2020.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.