Papers
Topics
Authors
Recent
Search
2000 character limit reached

Graded Fast Hard Thresholding Pursuit (GFHTP₁)

Updated 17 January 2026
  • GFHTP₁ is an iterative greedy algorithm that recovers sparse signals by incrementally identifying the support with hard thresholding, without requiring prior sparsity knowledge.
  • It robustly handles sparse regression with LAD loss by using masked subgradient steps and quantile-based outlier detection to ensure global linear convergence under RIP conditions.
  • The algorithm demonstrates competitive empirical performance in high-dimensional settings, scaling efficiently and achieving fast recovery on synthetic datasets and MNIST image tasks.

Graded Fast Hard Thresholding Pursuit (GFHTP1_1) is an iterative greedy selection algorithm for finding sparse solutions to underdetermined or highly contaminated linear systems, particularly under sparsity and outlier-robustness constraints. GFHTP1_1 applies to both general convex sparsity-constrained objectives and specialized statistical estimation tasks, notably sparse regression with least absolute deviations (LAD) loss. Its defining characteristics are a "graded" support scheme that incrementally identifies the sparsity pattern and a streamlined update rule comprising hard thresholding operations, all without requiring prior knowledge of the underlying solution’s sparsity level or extensive user parameter tuning (Yuan et al., 2013, Xu et al., 10 Jan 2026).

1. Problem Setting and Algorithmic Structure

GFHTP1_1 addresses the general class of sparsity-constrained convex optimization problems

minx f(x)subject tox0k,\min_x\ f(x)\quad\text{subject to}\quad \|x\|_0 \leq k,

where f:RpRf:\mathbb{R}^p\to\mathbb{R} is a smooth convex function and kk is a prescribed sparsity (Yuan et al., 2013). In the context of robust signal recovery, it can be specialized to minimizing the 1\ell_1 (LAD) loss under a sparsity constraint,

minxbAx1subject tox0s,\min_x \|b - A x\|_1 \quad \text{subject to}\quad \|x\|_0 \leq s,

where ARm×nA\in\mathbb{R}^{m\times n} is known, bb is the observation, xx the sparse parameter of interest, and ss the (unknown) true sparsity (Xu et al., 10 Jan 2026).

The GFHTP1_1 update consists of two core steps per iteration:

  • Gradient or subgradient step appropriate to the loss ff;
  • Hard thresholding to project onto the kk-sparse set, using the operator Hk(v)H_k(v) which keeps the kk entries of vv with largest absolute value and sets the remainder to zero.

In the "graded" version for robust LAD recovery, the support size is incrementally grown without knowledge of ss; i.e., at each outer iteration kk, the support is of size k+1k+1, enabling automatic sparsity discovery (Xu et al., 10 Jan 2026).

2. Algorithmic Details and Implementation

GFHTP1_1 for Smooth Convex Objectives

For ff smooth and convex, the update at iteration tt is:

  1. x~(t)=x(t1)ηf(x(t1))\widetilde{x}^{(t)} = x^{(t-1)} - \eta \nabla f(x^{(t-1)})
  2. x(t)=Hk(x~(t))x^{(t)} = H_k(\widetilde{x}^{(t)})
  3. Terminate when x(t)x(t1)/x(t1)ϵ\|x^{(t)} - x^{(t-1)}\| / \|x^{(t-1)}\| \leq \epsilon.

No debiasing or grading step is used; this yields the "fast" variant compared to GraHTP with debiasing (Yuan et al., 2013).

GFHTP1_1 for LAD with Outliers

For the LAD problem with high contamination, the update scheme involves:

  • Outer iteration: At step kk, compute quantile θτ\theta_\tau of current residuals, identify "small" residuals, and mask large outliers.
  • Subgradient step: Use truncated subgradient (weighted by the indicator of small residuals).
  • Support update: Apply Hk+1H_{k+1} operator over xk+tk+1,0gx^k + t_{k+1,0} g, where gg is the masked subgradient.
  • Inner loop: Perform LL restricted subgradient steps on the candidate support Sk+1S^{k+1}.
  • Step-size: Scaled by median/truncated norm, parameterized by μ\mu (fixed, e.g., 6) for stability (Xu et al., 10 Jan 2026).

The process continues until the masked LAD loss drops below a tolerance. The entire procedure is parameter-free except for a moderate-scale step-size constant μ\mu, with no need for manual specification of sparsity ss.

3. Theoretical Guarantees

General Convex Case

GFHTP1_1 achieves linear convergence under a graded restricted contraction (Condition C(s,ζ,ρs)(s, \zeta, \rho_s)). Specifically, if the step-size η<ζ\eta < \zeta and μ2=2[1η/ζ+(2η/ζ)ρs]<1\mu_2 = 2[1-\eta/\zeta + (2-\eta/\zeta)\rho_s] < 1, then

x(t)xˉ2μ2tx(0)xˉ2+2η1μ2[f(xˉ)]s2,\|x^{(t)} - \bar{x}\|_2 \leq \mu_2^t \|x^{(0)} - \bar{x}\|_2 + \frac{2\eta}{1-\mu_2} \|[\nabla f(\bar{x})]_s\|_2,

yielding geometric convergence and exact finite-step recovery when xˉ\bar{x} is an exact minimizer (Yuan et al., 2013).

Robust LAD Setting

Assuming the design matrix AA satisfies the restricted 1-isometry property (RIP1_1) of appropriate order and contamination fraction pp is bounded below $1/2$, GFHTP1_1 shows:

  • Global linear convergence: For any ss-sparse signal x0x_0,

xkx02ρkx0x02,ρ<1,\|x^k - x_0\|_2 \leq \rho^k \|x^0 - x_0\|_2, \quad \rho < 1,

after kk outer steps with suitable μ\mu and quantile parameters.

  • Exact support/signal recovery for flat signals: For sufficiently "flat" x0x_0 and Gaussian AA, exact support identification and reconstruction occur after ss outer steps with high probability (Xu et al., 10 Jan 2026).

The proof strategy leverages contraction from inner subgradient updates and a support-matching induction relying on quantile concentration and RIP1_1 structural control.

4. Computational Complexity and Practical Considerations

The per-iteration complexity of GFHTP1_1 is as follows:

Operation Complexity (per iter) Context
Gradient or subgradient computation O(mn)O(mn) LAD and general convex loss
Quantile computation (LAD) O(mlogm)O(m\log m) Needed to mask outliers (LAD only)
Hard thresholding HkH_k O(nlogk)O(n\log k) or O(n)O(n) Partial sort or selection
Inner loop (LAD, support size ss) O(sm+slogs)O(s m + s\log s) Per inner subgradient step

For most settings, with LL inner steps and ksk\lesssim s, total complexity is O(smn)O(smn) per run (Xu et al., 10 Jan 2026). Compared to PSGD and AIHT, GFHTP1_1 is often faster when the true sparsity ss is unknown or the data is heavily contaminated.

Parameter selection is minimized: only the step-size scaling μ\mu is user-set (in a moderate theoretical range, e.g., μ=6\mu=6 suffices in experiments), and the quantile parameter τ\tau is typically set near 0.5. The stopping criterion is typically based on the relative iterate change or capped at a maximum number of iterations.

5. Empirical Performance and Applications

GFHTP1_1 shows strong empirical performance on both synthetic and real data:

  • Synthetic sparse signal recovery: In experiments with m=1000m=1000, n=5000n=5000, sparsity s=5s=5–$15$, and outlier fractions p=0p=0–$0.5$, GFHTP1_1 achieves near-perfect support recovery (success rate $1.00$) and low relative error, with CPU time competitive with or superior to other greedy algorithms (AIHT, PSGD).
  • MNIST image recovery: For image vectors observed through random Gaussian projections and 10%10\% outlier corruption, GFHTP1_1 reaches SNR >>80 dB in \sim9 ms, compared to PSGD's SNR of \sim5 dB in over 1 s.

These results underline the robustness, computational efficiency, and adaptivity of GFHTP1_1 in outlier-prone and high-dimensional settings (Xu et al., 10 Jan 2026, Yuan et al., 2013).

6. Advantages, Limitations, and Extensions

GFHTP1_1 provides the following core advantages:

  • No a priori knowledge of sparsity ss needed; automatic graded support growth.
  • Nearly parameter-free, requiring only a single moderate step-size scale.
  • Provable linear convergence under RIP1_1-like structural assumptions.
  • Robustness to large-magnitude outliers via adaptive masked subgradients.
  • Efficient for high-dimensional or contaminated regimes.

Limitations include the O(mlogm)O(m\log m) overhead of quantile computation for very large datasets, a restriction to random Gaussian-type AA for current theory, and possibly conservative theoretical step-size ranges.

Potential extensions outlined include:

  • Adaptive or line-searched step-size schemes.
  • Momentum or Nesterov acceleration in the inner optimization loops.
  • Generalizations to nonlinear or low-rank matrix recovery settings.
  • Distributed and online adaptations for streaming scenarios.
  • Data-driven automation for the quantile truncation parameter.

A plausible implication is the broader applicability of the "graded" hard thresholding paradigm to a range of nonconvex, high-dimensional, and robust estimation problems beyond 1\ell_1 loss. However, extensions to structured or compressive measurement matrices and rigorous theoretical guarantees for such cases remain prominent avenues for investigation (Xu et al., 10 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Graded Fast Hard Thresholding Pursuit (GFHTP$_1$).