Graded Fast Hard Thresholding Pursuit (GFHTP₁)

Updated 17 January 2026

GFHTP₁ is an iterative greedy algorithm that recovers sparse signals by incrementally identifying the support with hard thresholding, without requiring prior sparsity knowledge.
It robustly handles sparse regression with LAD loss by using masked subgradient steps and quantile-based outlier detection to ensure global linear convergence under RIP conditions.
The algorithm demonstrates competitive empirical performance in high-dimensional settings, scaling efficiently and achieving fast recovery on synthetic datasets and MNIST image tasks.

Graded Fast Hard Thresholding Pursuit (GFHTP $_1$ ) is an iterative greedy selection algorithm for finding sparse solutions to underdetermined or highly contaminated linear systems, particularly under sparsity and outlier-robustness constraints. GFHTP $_1$ applies to both general convex sparsity-constrained objectives and specialized statistical estimation tasks, notably sparse regression with least absolute deviations (LAD) loss. Its defining characteristics are a "graded" support scheme that incrementally identifies the sparsity pattern and a streamlined update rule comprising hard thresholding operations, all without requiring prior knowledge of the underlying solution’s sparsity level or extensive user parameter tuning (Yuan et al., 2013, Xu et al., 10 Jan 2026).

1. Problem Setting and Algorithmic Structure

GFHTP $_1$ addresses the general class of sparsity-constrained convex optimization problems

$\min_x\ f(x)\quad\text{subject to}\quad \|x\|_0 \leq k,$

where $f:\mathbb{R}^p\to\mathbb{R}$ is a smooth convex function and $k$ is a prescribed sparsity (Yuan et al., 2013). In the context of robust signal recovery, it can be specialized to minimizing the $\ell_1$ (LAD) loss under a sparsity constraint,

$\min_x \|b - A x\|_1 \quad \text{subject to}\quad \|x\|_0 \leq s,$

where $A\in\mathbb{R}^{m\times n}$ is known, $b$ is the observation, $x$ the sparse parameter of interest, and $s$ the (unknown) true sparsity (Xu et al., 10 Jan 2026).

The GFHTP $_1$ update consists of two core steps per iteration:

Gradient or subgradient step appropriate to the loss $f$ ;
Hard thresholding to project onto the $k$ -sparse set, using the operator $H_k(v)$ which keeps the $k$ entries of $v$ with largest absolute value and sets the remainder to zero.

In the "graded" version for robust LAD recovery, the support size is incrementally grown without knowledge of $s$ ; i.e., at each outer iteration $k$ , the support is of size $k+1$ , enabling automatic sparsity discovery (Xu et al., 10 Jan 2026).

2. Algorithmic Details and Implementation

GFHTP $_1$ for Smooth Convex Objectives

For $f$ smooth and convex, the update at iteration $t$ is:

$\widetilde{x}^{(t)} = x^{(t-1)} - \eta \nabla f(x^{(t-1)})$
$x^{(t)} = H_k(\widetilde{x}^{(t)})$
Terminate when $\|x^{(t)} - x^{(t-1)}\| / \|x^{(t-1)}\| \leq \epsilon$ .

No debiasing or grading step is used; this yields the "fast" variant compared to GraHTP with debiasing (Yuan et al., 2013).

GFHTP $_1$ for LAD with Outliers

For the LAD problem with high contamination, the update scheme involves:

Outer iteration: At step $k$ , compute quantile $\theta_\tau$ of current residuals, identify "small" residuals, and mask large outliers.
Subgradient step: Use truncated subgradient (weighted by the indicator of small residuals).
Support update: Apply $H_{k+1}$ operator over $x^k + t_{k+1,0} g$ , where $g$ is the masked subgradient.
Inner loop: Perform $L$ restricted subgradient steps on the candidate support $S^{k+1}$ .
Step-size: Scaled by median/truncated norm, parameterized by $\mu$ (fixed, e.g., 6) for stability (Xu et al., 10 Jan 2026).

The process continues until the masked LAD loss drops below a tolerance. The entire procedure is parameter-free except for a moderate-scale step-size constant $\mu$ , with no need for manual specification of sparsity $s$ .

3. Theoretical Guarantees

General Convex Case

GFHTP $_1$ achieves linear convergence under a graded restricted contraction (Condition C $(s, \zeta, \rho_s)$ ). Specifically, if the step-size $\eta < \zeta$ and $\mu_2 = 2[1-\eta/\zeta + (2-\eta/\zeta)\rho_s] < 1$ , then

$\|x^{(t)} - \bar{x}\|_2 \leq \mu_2^t \|x^{(0)} - \bar{x}\|_2 + \frac{2\eta}{1-\mu_2} \|[\nabla f(\bar{x})]_s\|_2,$

yielding geometric convergence and exact finite-step recovery when $\bar{x}$ is an exact minimizer (Yuan et al., 2013).

Robust LAD Setting

Assuming the design matrix $A$ satisfies the restricted 1-isometry property (RIP $_1$ ) of appropriate order and contamination fraction $p$ is bounded below $1/2$, GFHTP $_1$ shows:

Global linear convergence: For any $s$ -sparse signal $x_0$ ,

$\|x^k - x_0\|_2 \leq \rho^k \|x^0 - x_0\|_2, \quad \rho < 1,$

after $k$ outer steps with suitable $\mu$ and quantile parameters.

Exact support/signal recovery for flat signals: For sufficiently "flat" $x_0$ and Gaussian $A$ , exact support identification and reconstruction occur after $s$ outer steps with high probability (Xu et al., 10 Jan 2026).

The proof strategy leverages contraction from inner subgradient updates and a support-matching induction relying on quantile concentration and RIP $_1$ structural control.

4. Computational Complexity and Practical Considerations

The per-iteration complexity of GFHTP $_1$ is as follows:

Operation	Complexity (per iter)	Context
Gradient or subgradient computation	$O(mn)$	LAD and general convex loss
Quantile computation (LAD)	$O(m\log m)$	Needed to mask outliers (LAD only)
Hard thresholding $H_k$	$O(n\log k)$ or $O(n)$	Partial sort or selection
Inner loop (LAD, support size $s$ )	$O(s m + s\log s)$	Per inner subgradient step

For most settings, with $L$ inner steps and $k\lesssim s$ , total complexity is $O(smn)$ per run (Xu et al., 10 Jan 2026). Compared to PSGD and AIHT, GFHTP $_1$ is often faster when the true sparsity $s$ is unknown or the data is heavily contaminated.

Parameter selection is minimized: only the step-size scaling $\mu$ is user-set (in a moderate theoretical range, e.g., $\mu=6$ suffices in experiments), and the quantile parameter $\tau$ is typically set near 0.5. The stopping criterion is typically based on the relative iterate change or capped at a maximum number of iterations.

5. Empirical Performance and Applications

GFHTP $_1$ shows strong empirical performance on both synthetic and real data:

Synthetic sparse signal recovery: In experiments with $m=1000$ , $n=5000$ , sparsity $s=5$ –$15$, and outlier fractions $p=0$ –$0.5$, GFHTP $_1$ achieves near-perfect support recovery (success rate $1.00$) and low relative error, with CPU time competitive with or superior to other greedy algorithms (AIHT, PSGD).
MNIST image recovery: For image vectors observed through random Gaussian projections and $10\%$ outlier corruption, GFHTP $_1$ reaches SNR $>$ 80 dB in $\sim$ 9 ms, compared to PSGD's SNR of $\sim$ 5 dB in over 1 s.

These results underline the robustness, computational efficiency, and adaptivity of GFHTP $_1$ in outlier-prone and high-dimensional settings (Xu et al., 10 Jan 2026, Yuan et al., 2013).

6. Advantages, Limitations, and Extensions

GFHTP $_1$ provides the following core advantages:

No a priori knowledge of sparsity $s$ needed; automatic graded support growth.
Nearly parameter-free, requiring only a single moderate step-size scale.
Provable linear convergence under RIP $_1$ -like structural assumptions.
Robustness to large-magnitude outliers via adaptive masked subgradients.
Efficient for high-dimensional or contaminated regimes.

Limitations include the $O(m\log m)$ overhead of quantile computation for very large datasets, a restriction to random Gaussian-type $A$ for current theory, and possibly conservative theoretical step-size ranges.

Potential extensions outlined include:

Adaptive or line-searched step-size schemes.
Momentum or Nesterov acceleration in the inner optimization loops.
Generalizations to nonlinear or low-rank matrix recovery settings.
Distributed and online adaptations for streaming scenarios.
Data-driven automation for the quantile truncation parameter.

A plausible implication is the broader applicability of the "graded" hard thresholding paradigm to a range of nonconvex, high-dimensional, and robust estimation problems beyond $\ell_1$ loss. However, extensions to structured or compressive measurement matrices and rigorous theoretical guarantees for such cases remain prominent avenues for investigation (Xu et al., 10 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (2)

Gradient Hard Thresholding Pursuit for Sparsity-Constrained Optimization (2013)

Hard Thresholding Pursuit Algorithms for Least Absolute Deviations Problem (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Graded Fast Hard Thresholding Pursuit (GFHTP$_1$).