Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles
Abstract: Feature bagging is a well-established ensembling method which aims to reduce prediction variance by combining predictions of many estimators trained on subsets or projections of features. Here, we develop a theory of feature-bagging in noisy least-squares ridge ensembles and simplify the resulting learning curves in the special case of equicorrelated data. Using analytical learning curves, we demonstrate that subsampling shifts the double-descent peak of a linear predictor. This leads us to introduce heterogeneous feature ensembling, with estimators built on varying numbers of feature dimensions, as a computationally efficient method to mitigate double-descent. Then, we compare the performance of a feature-subsampling ensemble to a single linear predictor, describing a trade-off between noise amplification due to subsampling and noise reduction due to ensembling. Our qualitative insights carry over to linear classifiers applied to image classification tasks with realistic datasets constructed using a state-of-the-art deep learning feature map.
- Gautam Kunapuli. Ensemble Methods for Machine Learning. Simon and Schuster, 2023.
- Attribute bagging: improving accuracy of classifier ensembles by using random feature subsets. Pattern recognition, 36(6):1291–1302, 2003.
- Tin Kam Ho. The random subspace method for constructing decision forests. IEEE transactions on pattern analysis and machine intelligence, 20(8):832–844, 1998.
- Shape quantization and recognition with randomized trees. Neural computation, 9(7):1545–1588, 1997.
- Ensembles on random patches. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2012, Bristol, UK, September 24-28, 2012. Proceedings, Part I 23, pages 346–361. Springer, 2012.
- Marina Skurichina and Robert P. W. Duin. Bagging, boosting and the random subspace method for linear classifiers. Pattern Analysis & Applications, 5(2):121–135, June 2002.
- Tin Kam Ho. Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition, volume 1, pages 278–282. IEEE, 1995.
- Spin Glass Theory and Beyond: An Introduction to the Replica Method and Its Applications. World Scientific Publishing Company, 1987.
- Spin Glass Theory and Beyond. WORLD SCIENTIFIC, November 1986.
- Statistical mechanics of learning from examples. Physical review A, 45(8):6056, 1992.
- Andreas Engel and Christian Van den Broeck. Statistical mechanics of learning. Cambridge University Press, 2001.
- Statistical mechanics of deep learning. Annual Review of Condensed Matter Physics, 11:501–528, 2020.
- Preetum Nakkiran. More data can hurt for linear regression: Sample-wise double descent. arXiv preprint arXiv:1912.07242, 2019.
- Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks. Nature communications, 12(1):2914, 2021.
- Regularization-wise double descent: Why it occurs and how to eliminate it. In 2022 IEEE International Symposium on Information Theory (ISIT), pages 426–431. IEEE, 2022.
- Surprises in high-dimensional ridgeless least squares interpolation. The Annals of Statistics, 50(2):949 – 986, 2022.
- The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer, 2009.
- Bias-variance decomposition of overparameterized regression with random linear features. Physical Review E, 106:025304, Aug 2022.
- Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models. Physical Review Research, 4:013201, Mar 2022.
- The generalization error of random features regression: Precise asymptotics and the double descent curve. Communications on Pure and Applied Mathematics, 75, 2019.
- Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48):30063–30070, 2020.
- Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proceedings of the National Academy of Sciences, 116(32):15849–15854, 2019.
- Universality laws for high-dimensional learning with random features. IEEE Transactions on Information Theory, 69(3):1932–1964, 2023.
- Double trouble in double descent: Bias and variance(s) in the lazy regime. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 2280–2290. PMLR, 13–18 Jul 2020.
- Understanding double descent requires a fine-grained bias-variance decomposition. In Advances in Neural Information Processing Systems, volume 33, pages 11022–11032, 2020.
- Contrasting random and learned features in deep bayesian linear regression, 2022.
- Spectrum dependent learning curves in kernel regression and wide neural networks. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 1024–1034. PMLR, 13–18 Jul 2020.
- The Eigenlearning framework: A conservation law perspective on kernel regression and wide neural networks. arXiv, 2022.
- Francis Bach. High-dimensional analysis of double descent for linear regression with random projections, 2023.
- Learning curves for deep structured gaussian feature models, 2023.
- Fluctuations, bias, variance & ensemble of learners: Exact asymptotics for convex losses in high-dimension. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 14283–14314. PMLR, 17–23 Jul 2022.
- The implicit regularization of ordinary least squares ensembles. In Silvia Chiappa and Roberto Calandra, editors, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, volume 108 of Proceedings of Machine Learning Research, pages 3525–3535. PMLR, 26–28 Aug 2020.
- Asymptotically free sketched ridge ensembles: Risks, cross-validation, and tuning, 2023.
- Subsample ridge ensembles: Equivalences and generalized cross-validation, 2023.
- Bagging in overparameterized learning: Risk characterization and risk monotonization, 2022.
- The onset of variance-limited behavior for networks in the lazy and rich regimes. In The Eleventh International Conference on Learning Representations, 2023.
- Out-of-distribution generalization in kernel regression. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 12600–12612. Curran Associates, Inc., 2021.
- Learning with ensembles: How overfitting can be useful. Advances in neural information processing systems, 8, 1995.
- Learning curves of generic features maps for realistic datasets with a teacher-student model. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
- Analyzing the effects of noise and variation on the accuracy of analog neural networks. In 2020 IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS). IEEE, August 2020.
- Noise in the nervous system. Nature Reviews Neuroscience, 9(4):292–303, April 2008.
- Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. The Annals of Mathematical Statistics, 21(1):124–127, March 1950.
- Optimal regularization can mitigate double descent. arXiv preprint arXiv:2003.01897, 2020.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- JÂ Howard. Imagenette. Accessed: 25-10-2025.
- Aggregated residual transformations for deep neural networks, 2017.
- Redundant representations help generalization in wide neural networks. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
- A solvable model of neural scaling laws, 2022.
- Population codes enable learning from few examples by shaping inductive bias. eLife, 11, December 2022.
- B OLSHAUSEN and D FIELD. Sparse coding of sensory inputs. Current Opinion in Neurobiology, 14(4):481–487, August 2004.
- Luc Devroye. Non-Uniform Random Variate Generation. Springer New York, 1986.
- Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
- John R. Silvester. Determinants of block matrices. The Mathematical Gazette, 84(501):460–467, November 2000.
- Wolfram Research, Inc. Mathematica, Version 13.3. Champaign, IL, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.