On the Selection Stability of Stability Selection and Its Applications

Published 14 Nov 2024 in stat.ME, stat.CO, and stat.ML | (2411.09097v3)

Abstract: Stability selection is a widely adopted resampling-based framework for high-dimensional variable selection. This paper seeks to broaden the use of an established stability estimator to evaluate the overall stability of the stability selection results, moving beyond single-variable analysis. We suggest that the stability estimator offers two advantages: it can serve as a reference to reflect the robustness of the results obtained, and it can help identify a Pareto optimal regularization value to improve stability. By determining the regularization value, we calibrate key stability selection parameters, namely, the decision-making threshold and the expected number of falsely selected variables, within established theoretical bounds. In addition, the convergence of stability values over successive sub-samples sheds light on the required number of sub-samples addressing a notable gap in prior studies. The \texttt{stabplot} R package is developed to facilitate the use of the methodology featured in this paper.