On the design-dependent suboptimality of the Lasso
Abstract: This paper investigates the effect of the design matrix on the ability (or inability) to estimate a sparse parameter in linear regression. More specifically, we characterize the optimal rate of estimation when the smallest singular value of the design matrix is bounded away from zero. In addition to this information-theoretic result, we provide and analyze a procedure which is simultaneously statistically optimal and computationally efficient, based on soft thresholding the ordinary least squares estimator. Most surprisingly, we show that the Lasso estimator -- despite its widespread adoption for sparse linear regression -- is provably minimax rate-suboptimal when the minimum singular value is small. We present a family of design matrices and sparse parameters for which we can guarantee that the Lasso with any choice of regularization parameter -- including those which are data-dependent and randomized -- would fail in the sense that its estimation rate is suboptimal by polynomial factors in the sample size. Our lower bound is strong enough to preclude the statistical optimality of all forms of the Lasso, including its highly popular penalized, norm-constrained, and cross-validated variants.
- Gaussian model selection. Journal of the European Mathematical Society, 3:203–268, 2001.
- Peter Bühlmann and Sara Van De Geer. Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media, 2011.
- How well can we estimate a sparse vector? Applied and Computational Harmonic Analysis, 34(2):317–323, 2013.
- Near-ideal model selection by ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT minimization. Ann. Statist., 37(5A):2145–2177, 2009.
- On the prediction performance of the Lasso. Bernoulli, 23(1):552–581, 2017.
- Lee H. Dicker. Ridge regression and asymptotic minimax estimation over spheres of growing dimension. Bernoulli, 22(1):1–37, 2016.
- Minimax risk over ℓpsubscriptℓ𝑝\ell_{p}roman_ℓ start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT-balls for ℓqsubscriptℓ𝑞\ell_{q}roman_ℓ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-error. Probab. Theory Related Fields, 99(2):277–303, 1994.
- Rick Durrett. Probability—theory and examples, volume 49 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge, 2019.
- Out of the ordinary: Spectrally adapting regression for covariate shift. arXiv preprint arXiv:2312.17463, 2023.
- Statistical foundations of data science. CRC press, 2020.
- Fast-rate and optimistic-rate error bounds for ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-regularized regression. arXiv preprint arXiv:1108.0373, 2011.
- Maximum likelihood estimation is all you need for well-specified covariate shift. arXiv preprint arXiv:2311.15961, 2023.
- Tight lower bound on the probability of a binomial exceeding its expectation. Statist. Probab. Lett., 86:91–98, 2014.
- Statistical learning with sparsity: the Lasso and generalizations. CRC press, 2015.
- Random design analysis of ridge regression. In Conference on learning theory, pages 9–1. JMLR Workshop and Conference Proceedings, 2012.
- Iain M. Johnstone. Gaussian estimation: Sequence and wavelet models. Book manuscript, September 2019.
- On the power of preconditioning in sparse linear regression. In 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science—FOCS 2021, pages 550–561. 2022.
- Marginal singularity and the benefits of labels in covariate-shift. The Annals of Statistics, 49(6):3299–3323, 2021.
- Theory of point estimation. Springer Science & Business Media, 2006.
- Near-optimal linear regression under distribution shift. In International Conference on Machine Learning, pages 6164–6174. PMLR, 2021.
- Optimally tackling covariate shift in RKHS-based nonparametric regression. The Annals of Statistics, 51(2):738–761, 2023.
- A new similarity measure for covariate shift with applications to nonparametric regression. In International Conference on Machine Learning, pages 17517–17530. PMLR, 2022.
- Noisy recovery from random linear observations: Sharp minimax rates under elliptical constraints, 2023.
- Minimax rates of estimation for high-dimensional linear regression over ℓqsubscriptℓ𝑞\ell_{q}roman_ℓ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT-balls. IEEE transactions on information theory, 57(10):6976–6994, 2011.
- Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference, 90(2):227–244, 2000.
- Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267–288, 1996.
- Sara van de Geer. On tight bounds for the Lasso. J. Mach. Learn. Res., 19:Paper No. 46, 48, 2018.
- Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019.
- Kaizheng Wang. Pseudo-labeling for kernel ridge regression under covariate shift. arXiv preprint arXiv:2302.10160, 2023.
- A class of geometric structures in transfer learning: Minimax bounds and optimality. In International Conference on Artificial Intelligence and Statistics, pages 3794–3820. PMLR, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.