Papers
Topics
Authors
Recent
Search
2000 character limit reached

Thresholded Lasso for high dimensional variable selection

Published 27 Sep 2023 in math.ST and stat.TH | (2309.15355v1)

Abstract: Given $n$ noisy samples with $p$ dimensions, where $n \ll p$, we show that the multi-step thresholding procedure based on the Lasso -- we call it the {\it Thresholded Lasso}, can accurately estimate a sparse vector $\beta \in {\mathbb R}p$ in a linear model $Y = X \beta + \epsilon$, where $X_{n \times p}$ is a design matrix normalized to have column $\ell_2$-norm $\sqrt{n}$, and $\epsilon \sim N(0, \sigma2 I_n)$. We show that under the restricted eigenvalue (RE) condition, it is possible to achieve the $\ell_2$ loss within a logarithmic factor of the ideal mean square error one would achieve with an {\em oracle } while selecting a sufficiently sparse model -- hence achieving {\it sparse oracle inequalities}; the oracle would supply perfect information about which coordinates are non-zero and which are above the noise level. We also show for the Gauss-Dantzig selector (Cand`{e}s-Tao 07), if $X$ obeys a uniform uncertainty principle, one will achieve the sparse oracle inequalities as above, while allowing at most $s_0$ irrelevant variables in the model in the worst case, where $s_0 \leq s$ is the smallest integer such that for $\lambda = \sqrt{2 \log p/n}$, $\sum_{i=1}p \min(\beta_i2, \lambda2 \sigma2) \leq s_0 \lambda2 \sigma2$. Our simulation results on the Thresholded Lasso match our theoretical analysis excellently.

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.