Graded Fast Hard Thresholding Pursuit (GFHTP₁)
- GFHTP₁ is an iterative greedy algorithm that recovers sparse signals by incrementally identifying the support with hard thresholding, without requiring prior sparsity knowledge.
- It robustly handles sparse regression with LAD loss by using masked subgradient steps and quantile-based outlier detection to ensure global linear convergence under RIP conditions.
- The algorithm demonstrates competitive empirical performance in high-dimensional settings, scaling efficiently and achieving fast recovery on synthetic datasets and MNIST image tasks.
Graded Fast Hard Thresholding Pursuit (GFHTP) is an iterative greedy selection algorithm for finding sparse solutions to underdetermined or highly contaminated linear systems, particularly under sparsity and outlier-robustness constraints. GFHTP applies to both general convex sparsity-constrained objectives and specialized statistical estimation tasks, notably sparse regression with least absolute deviations (LAD) loss. Its defining characteristics are a "graded" support scheme that incrementally identifies the sparsity pattern and a streamlined update rule comprising hard thresholding operations, all without requiring prior knowledge of the underlying solution’s sparsity level or extensive user parameter tuning (Yuan et al., 2013, Xu et al., 10 Jan 2026).
1. Problem Setting and Algorithmic Structure
GFHTP addresses the general class of sparsity-constrained convex optimization problems
where is a smooth convex function and is a prescribed sparsity (Yuan et al., 2013). In the context of robust signal recovery, it can be specialized to minimizing the (LAD) loss under a sparsity constraint,
where is known, is the observation, the sparse parameter of interest, and the (unknown) true sparsity (Xu et al., 10 Jan 2026).
The GFHTP update consists of two core steps per iteration:
- Gradient or subgradient step appropriate to the loss ;
- Hard thresholding to project onto the -sparse set, using the operator which keeps the entries of with largest absolute value and sets the remainder to zero.
In the "graded" version for robust LAD recovery, the support size is incrementally grown without knowledge of ; i.e., at each outer iteration , the support is of size , enabling automatic sparsity discovery (Xu et al., 10 Jan 2026).
2. Algorithmic Details and Implementation
GFHTP for Smooth Convex Objectives
For smooth and convex, the update at iteration is:
- Terminate when .
No debiasing or grading step is used; this yields the "fast" variant compared to GraHTP with debiasing (Yuan et al., 2013).
GFHTP for LAD with Outliers
For the LAD problem with high contamination, the update scheme involves:
- Outer iteration: At step , compute quantile of current residuals, identify "small" residuals, and mask large outliers.
- Subgradient step: Use truncated subgradient (weighted by the indicator of small residuals).
- Support update: Apply operator over , where is the masked subgradient.
- Inner loop: Perform restricted subgradient steps on the candidate support .
- Step-size: Scaled by median/truncated norm, parameterized by (fixed, e.g., 6) for stability (Xu et al., 10 Jan 2026).
The process continues until the masked LAD loss drops below a tolerance. The entire procedure is parameter-free except for a moderate-scale step-size constant , with no need for manual specification of sparsity .
3. Theoretical Guarantees
General Convex Case
GFHTP achieves linear convergence under a graded restricted contraction (Condition C). Specifically, if the step-size and , then
yielding geometric convergence and exact finite-step recovery when is an exact minimizer (Yuan et al., 2013).
Robust LAD Setting
Assuming the design matrix satisfies the restricted 1-isometry property (RIP) of appropriate order and contamination fraction is bounded below $1/2$, GFHTP shows:
- Global linear convergence: For any -sparse signal ,
after outer steps with suitable and quantile parameters.
- Exact support/signal recovery for flat signals: For sufficiently "flat" and Gaussian , exact support identification and reconstruction occur after outer steps with high probability (Xu et al., 10 Jan 2026).
The proof strategy leverages contraction from inner subgradient updates and a support-matching induction relying on quantile concentration and RIP structural control.
4. Computational Complexity and Practical Considerations
The per-iteration complexity of GFHTP is as follows:
| Operation | Complexity (per iter) | Context |
|---|---|---|
| Gradient or subgradient computation | LAD and general convex loss | |
| Quantile computation (LAD) | Needed to mask outliers (LAD only) | |
| Hard thresholding | or | Partial sort or selection |
| Inner loop (LAD, support size ) | Per inner subgradient step |
For most settings, with inner steps and , total complexity is per run (Xu et al., 10 Jan 2026). Compared to PSGD and AIHT, GFHTP is often faster when the true sparsity is unknown or the data is heavily contaminated.
Parameter selection is minimized: only the step-size scaling is user-set (in a moderate theoretical range, e.g., suffices in experiments), and the quantile parameter is typically set near 0.5. The stopping criterion is typically based on the relative iterate change or capped at a maximum number of iterations.
5. Empirical Performance and Applications
GFHTP shows strong empirical performance on both synthetic and real data:
- Synthetic sparse signal recovery: In experiments with , , sparsity –$15$, and outlier fractions –$0.5$, GFHTP achieves near-perfect support recovery (success rate $1.00$) and low relative error, with CPU time competitive with or superior to other greedy algorithms (AIHT, PSGD).
- MNIST image recovery: For image vectors observed through random Gaussian projections and outlier corruption, GFHTP reaches SNR 80 dB in 9 ms, compared to PSGD's SNR of 5 dB in over 1 s.
These results underline the robustness, computational efficiency, and adaptivity of GFHTP in outlier-prone and high-dimensional settings (Xu et al., 10 Jan 2026, Yuan et al., 2013).
6. Advantages, Limitations, and Extensions
GFHTP provides the following core advantages:
- No a priori knowledge of sparsity needed; automatic graded support growth.
- Nearly parameter-free, requiring only a single moderate step-size scale.
- Provable linear convergence under RIP-like structural assumptions.
- Robustness to large-magnitude outliers via adaptive masked subgradients.
- Efficient for high-dimensional or contaminated regimes.
Limitations include the overhead of quantile computation for very large datasets, a restriction to random Gaussian-type for current theory, and possibly conservative theoretical step-size ranges.
Potential extensions outlined include:
- Adaptive or line-searched step-size schemes.
- Momentum or Nesterov acceleration in the inner optimization loops.
- Generalizations to nonlinear or low-rank matrix recovery settings.
- Distributed and online adaptations for streaming scenarios.
- Data-driven automation for the quantile truncation parameter.
A plausible implication is the broader applicability of the "graded" hard thresholding paradigm to a range of nonconvex, high-dimensional, and robust estimation problems beyond loss. However, extensions to structured or compressive measurement matrices and rigorous theoretical guarantees for such cases remain prominent avenues for investigation (Xu et al., 10 Jan 2026).