Variance Thresholding & Retained-Set PCA
- Variance thresholding and retained-set PCA are techniques that recover the support of sparse principal components in high dimensions by thresholding empirical variances and covariances.
- Covariance-thresholding refines the approach by incorporating off-diagonal information and leveraging probabilistic bounds to expand the feasible support regime for accurate recovery.
- These methods combine computational tractability with strong theoretical guarantees, using eigenanalysis on a retained coordinate set under the spiked covariance model.
Variance thresholding and retained-set principal component analysis (PCA) are algorithmic strategies for sparse PCA in high-dimensional settings, wherein the goal is to recover the support of sparse principal components (PCs) from noisy observations. These approaches combine thresholding of empirical variances or covariances with eigenanalysis restricted to a "retained" coordinate set, providing both computational tractability and theoretical guarantees for support recovery under the spiked covariance model. Covariance thresholding refines classical variance-thresholding by incorporating off-diagonal information and leveraging advanced probabilistic bounds, substantially expanding the feasible support size regime for accurate recovery (Deshpande et al., 2013).
1. Statistical Model: Sparse PCA and Spiked Covariance Framework
The core model for sparse PCA considered is the "spiked covariance" formulation. One observes independent samples :
where are orthonormal population PCs, each with ("-sparse"), , and are independent Gaussian noise vectors. The population covariance is thus . Of particular interest is the high-dimensional regime, comparable to or much larger than 0. The principal statistical objective is precise recovery of the supports 1.
2. Methodologies: Classical Variance-Thresholding vs. Covariance-Thresholding
Two principal retained-set PCA algorithms address sparse PC recovery in this setting:
A. Variance-Thresholding (Johnstone–Lu Diagonal Method)
- Compute the empirical covariance 2.
- Let 3. For threshold 4, define the retained index set:
5
(Adjustments may be made for non-unit noise variance via centering or alternative baselines.)
- Restrict 6 to the principal submatrix on 7, then compute its top-8 eigenvectors, padding with zeros off 9 for estimates 0.
- Optional secondary thresholding on 1 can enforce exact sparsity.
B. Covariance-Thresholding (Deshpande–Montanari Method)
- Split data into two halves. From the first half, form 2, and subtract the identity: 3.
- Entrywise apply soft-thresholding at 4:
5
- Extract the top 6 eigenvectors 7 of 8.
- On the second half, compute 9 and coordinate-wise scores 0.
- For threshold 1, define the retained set:
2
- Final support estimates are either principal eigenvectors of 3 restricted to 4 or thresholded 5.
3. Support Recovery Theory
The statistical guarantees describe the support size regimes in which high-probability recovery is assured, and corresponding thresholds that yield optimal behavior:
| Method | Support Size Regime | Threshold Choice | Guarantee |
|---|---|---|---|
| Variance-thresholding + retained PCA | 6 | 7 | 8 |
| Covariance-thresholding + retained PCA | 9 | 0 (or 1), 2 | 3 |
Here, 4 are universal positive constants, 5 is the minimal magnitude parameter on the support of 6, and 7 indicates asymptotic proportionality.
A key finding is that covariance thresholding expands the feasible support regime for exact recovery from 8 to 9, under mild moment conditions and in certain 0 scenarios. The sample complexity bound in the rank-one case is 1, where 2 is universal. Lower bounds from computational complexity theory indicate that no polynomial-time algorithm can succeed for substantially larger 3 (Deshpande et al., 2013).
4. Analysis Techniques and Spectral Norm Bounds
The proof of tight support recovery relies on precise control of the spectral norm of the randomized matrices arising from the noise in 4 and the effect of entrywise thresholding. Given 5 cross-terms 6, soft-thresholding balances retention of the rank-one signal and substantial attenuation of the noise. The challenging task is bounding 7, where 8 is non-Lipschitz.
Earlier results addressed asymptotic spectral distributions or smooth functions; here, a non-asymptotic tail bound for the given kernel random matrix is required. The authors develop an 9-net argument combined with a new concentration lemma for non-Lipschitz Gaussian functionals, controlling 0 via discretization of the sphere and analyzing typical Gaussian behavior versus rare bad events (Deshpande et al., 2013).
5. Practical Considerations and Threshold Selection
In applied settings, the theoretical threshold prescriptions depend on unknown quantities such as 1 and 2. Suggested heuristics are:
- Threshold 3: Select 4; when 5 this is 6.
- Pragmatic Rule: Use an estimated noise-variance level: 7, where 8 is estimated from the median absolute deviation (MAD) of off-diagonal or diagonal elements, with 9.
- Spectral diagnostics via the Marchenko–Pastur edge location on the thresholded matrix can also guide 0 selection.
The table below summarizes the main computational steps and their leading costs for covariance-thresholding:
| Step | Operation | Complexity |
|---|---|---|
| 1 | Split, form 1, 2 | 3 |
| 2 | Soft-threshold 4 | 5 |
| 3 | Top-6 eigenvectors | 7 or 8 |
| 4 | Form 9, actions on 0 | 1 |
| 5 | Submatrix eigendecomposition | 2 |
For 3 scenarios, randomized SVD or low-rank subspace iterations may provide efficient alternatives.
6. Connections to Literature and Limitations
Variance-thresholding and retained-set PCA, originally proposed by Johnstone and Lu, established foundational support-recovery thresholds in sparse PCA [Johnstone–Lu 2004]. Covariance-thresholding, as developed by Deshpande and Montanari, achieves strictly larger support recovery regimes without increased computational cost, and its guarantees match known computational lower bounds [berthet2013computational, ma2015sum]. These advances synthesize ideas from high-dimensional statistics, spectral random matrix theory, and computational complexity. No practical, polynomial-time algorithm currently exhibits better support recovery guarantees for the spiked covariance model under standard noise assumptions (Deshpande et al., 2013).
7. Summary and Contemporary Relevance
Variance-thresholding and retained-set PCA remain essential techniques for sparse principal component estimation in high dimensions. Covariance-thresholding generalizes diagonal thresholding, effectively leveraging both diagonal and off-diagonal information, and is distinguished by optimal support-recovery, robust performance in the high-dimensional regime, and refined non-asymptotic analysis. The introduction of sharp spectral norm bounds for thresholded random kernel matrices is a substantial technical contribution and informs the theoretical underpinnings of related algorithms in high-dimensional statistics and machine learning (Deshpande et al., 2013).