Iteratively Reweighted Least Squares (IRLS)

Updated 29 January 2026

IRLS is an iterative optimization framework that uses majorization-minimization and weighted least squares to solve non-smooth, non-convex objective functions.
It approximates complex penalties like ℓ₁, ℓₚ quasi-norms, and composite regularizations by smoothing and updating weights in sequential quadratic subproblems.
IRLS is widely applied in robust regression, low-rank matrix recovery, and inverse imaging, offering both practical speed and provable convergence guarantees.

The Iteratively Reweighted Least Squares (IRLS) method is a majorization-minimization framework for solving non-smooth, non-convex regularized regression and matrix recovery problems. Rooted in robust statistics and signal processing, IRLS alternates between solving sequence of quadratic (weighted least-squares) subproblems and updating the weights based on the current iterate, allowing efficient surrogation of non-quadratic objectives such as ℓ₁/sparsity, ℓₚ quasi-norms, group sparsity, rank/Schatten quasi-norms, and composite low-rank plus sparse penalizations. IRLS generalizes from classical single-norm regularizations to heterogeneous combinations and delivers both practical speed and, in many cases, provable global or local convergence rates (Lu et al., 2014, Fornasier et al., 2015, Kümmerle et al., 2017, Kümmerle et al., 2020, Lerman et al., 25 Jun 2025, Kümmerle et al., 2023, Kümmerle et al., 2018, Ene et al., 2 Oct 2025).

1. Mathematical Formulation and General Mechanism

The IRLS methodology addresses minimization problems of the generic form

$\min_{x\in\mathbb{R}^n}\;F(x),$

where F is typically a combination of a smooth loss (e.g., squared error) with one or more non-smooth regularization terms enforcing sparsity, low-rank, or structured properties. The core challenge is that such terms are non-differentiable or non-convex (e.g., ℓ₁-norm, Schatten-p quasi-norm for p<1, mixed norm surrogates).

The principal IRLS steps are:

Smoothing: Non-smooth terms are approximated using a smoothing parameter (e.g. $\mu$ or $\varepsilon$ ).
Weighted Quadratic Majorization: Quadratic upper bounds are constructed, yielding surrogates involving weighted least-squares in x (or Z for matrix problems).
Alternating Update: On each iteration, (a) solve the weighted least-squares subproblem; (b) update weights as functions of the current iterate (e.g., $w_i^{(k)} = (|x_i^{(k)}|^2+\varepsilon^2)^{\alpha}$ ).
Continuation: Smoothing parameters are reduced gradually, majorizing the original objective and yielding convergence to stationary points of the non-smooth problem in the limit.

Specific problem types include:

Sparse regression: $\min_x \|A x - b\|_2^2 + \lambda \|x\|_1$ or generalized $\sum_k \lambda_k |x_k|^{q_k}$ (Voronin et al., 2015)
Low-rank matrix recovery: $\min_X \|\Phi(X)-Y\|_2^2 + \lambda \|X\|_{S_p}^p$ (Kümmerle et al., 2017)
Mixed norm/structured objectives: e.g., joint minimization of low-rank and column-sparsity via Schatten-p and $\ell_{2,q}$ -norms (Lu et al., 2014, Kümmerle et al., 2023)
Robust regression: IRLS is widely used for $\ell_1$ - or truncated-loss minimization under arbitrary corruptions (Mukhoty et al., 2020, Ene et al., 2019)

2. Smoothing and Majorization-Minimization Principles

Non-smooth penalty terms are smoothed via padding techniques such as:

For vector ℓₚ, replace $|x_i|^p$ with $\mu$ 0 (Voronin et al., 2015, Lefkimmiatis et al., 2023)
For matrices, use $\mu$ 1 for Schatten-p, or log-det approximations for the rank (Lu et al., 2014, Kümmerle et al., 2018).

The majorization-minimization (MM) approach uses quadratic surrogates that upper-bound the penalization and touch the objective at current iterate:

For $\mu$ 2, tight majorizers are constructed via

$\mu$ 3

where $\mu$ 4 (Lefkimmiatis et al., 2023, Koshelev et al., 2023).

Updating iterates via the minimizer of the quadratic surrogate ensures monotonic decrease in the objective. The smoothing/majorization proofs rely on the concavity of the penalization term (e.g., $\mu$ 5 for $\mu$ 6) and trace inequalities for matrix norms (Lu et al., 2014, Kümmerle et al., 2017).

3. Algorithmic Structure and Pseudocode

The IRLS iteration typically takes the following form:

Generic IRLS Algorithm (Editor’s term)

Given measurements $\mu$ 7, operators $\mu$ 8, regularization strengths, initial $\mu$ 9, and smoothing parameter $\varepsilon$ 0.
Until convergence:
- Form weights $\varepsilon$ 1 from $\varepsilon$ 2.
- Solve weighted least-squares for $\varepsilon$ 3 (possibly with constraints on $\varepsilon$ 4), e.g.:
$\varepsilon$ 5

Update smoothing parameter (e.g., $\varepsilon$ 6, for best- $\varepsilon$ 7-term error).
Continue until change in $\varepsilon$ 8 is below desired tolerance.

A matrix IRLS step solving joint Schatten- $\varepsilon$ 9 and column $w_i^{(k)} = (|x_i^{(k)}|^2+\varepsilon^2)^{\alpha}$ 0 penalty reads (Lu et al., 2014): $w_i^{(k)} = (|x_i^{(k)}|^2+\varepsilon^2)^{\alpha}$ 1 where

$w_i^{(k)} = (|x_i^{(k)}|^2+\varepsilon^2)^{\alpha}$ 2

Algorithmic acceleration is often achieved via preconditioned conjugate gradient solvers (Fornasier et al., 2015, Chen et al., 2014), or embedding in bilevel or quasi-Newton frameworks for large-scale/ill-conditioned settings (Poon et al., 2021, Koshelev et al., 2023).

4. Convergence Theory

Convergence properties depend on the convexity (or strong convexity) of the smoothed objective and the underlying structure (e.g., null space property — NSP — of the measurement operator):

Global Convergence: For convex smoothed objective ( $w_i^{(k)} = (|x_i^{(k)}|^2+\varepsilon^2)^{\alpha}$ 3), IRLS globally converges to the unique minimizer (Lu et al., 2014, Kümmerle et al., 2020). For singleton minimizers in compressed sensing, global linear rate is established for certain IRLS variants (Kümmerle et al., 2020, Ene et al., 2019, Ene et al., 2 Oct 2025).
Local Superlinear Convergence: For nonconvex Schatten- $w_i^{(k)} = (|x_i^{(k)}|^2+\varepsilon^2)^{\alpha}$ 4 minimization, HM-IRLS yields local superlinear convergence rate of order $w_i^{(k)} = (|x_i^{(k)}|^2+\varepsilon^2)^{\alpha}$ 5 (Theorem: $w_i^{(k)} = (|x_i^{(k)}|^2+\varepsilon^2)^{\alpha}$ 6) in the presence of strong Schatten- $w_i^{(k)} = (|x_i^{(k)}|^2+\varepsilon^2)^{\alpha}$ 7 null-space property (Kümmerle et al., 2017).
Global recovery in robust regression: Stagewise-truncated IRLS variants (STIR, STIR-GD) guarantee convergence from arbitrary initialization under sub-Gaussian covariates, for up to a breakdown fraction of adversarial corruptions (Mukhoty et al., 2020). This is achieved via stage-wise truncation and weighted strong convexity arguments.
Composite structures: IRLS can handle joint row-sparsity and low-rank via quadratic surrogates for both structures, and local quadratic convergence is established under (r,s)-RIP (Kümmerle et al., 2023).
Manifold settings: Deterministic global convergence of IRLS (FMS-DS) for robust subspace recovery in Grassmann manifolds is proved under geometric data conditions (Lerman et al., 25 Jun 2025).

5. Extensions and Applications

IRLS has broad impact across domains:

Compressed Sensing and Signal Recovery: ℓ₁ and ℓₚ minimization for sparse vectors under linear measurements — with phase transitions favoring IRLS over first-order methods in high-dimensional regimes (Fornasier et al., 2015, Kümmerle et al., 2020).
Low-Rank and Structured Matrix Completion: Recovery of matrices with low-rank or structured (e.g., Hankel, Toeplitz) constraints, with IRLS outperforming alternatives for minimal sample complexity and difficult frequency separation (Kümmerle et al., 2018, Kümmerle et al., 2017).
Mixed/Composite Regularization: Simultaneous recovery under multiple structure-inducing penalties, with IRLS offering empirical quadratic convergence and reduced need for manual regularization balancing (Lu et al., 2014, Kümmerle et al., 2023).
Robust Regression/Subspace Recovery: IRLS variants provably achieve global recovery and resistance to corruption, outperforming standard least squares and robust alternatives (Mukhoty et al., 2020, Lerman et al., 25 Jun 2025).
Statistical Estimation: Fast estimation of polyserial and polychoric correlations via iteratively reweighted regression on conditional expectations (Zhang et al., 2022).
Inverse Imaging Problems: Learned IRLS networks with analysis-based ℓₚ or Schatten- $w_i^{(k)} = (|x_i^{(k)}|^2+\varepsilon^2)^{\alpha}$ 8 regularizers yield state-of-the-art restoration in deblurring, super-resolution, and demosaicking at low parameter counts, with memory-efficient implicit backpropagation via bilevel optimization (Lefkimmiatis et al., 2023, Koshelev et al., 2023).

6. Computational Aspects and Practical Implementation

Per-iteration IRLS cost is dominated by weighted least-squares solves, which can be efficiently handled via conjugate gradient methods, especially when measurement/operator matrices have fast multiplication (FFT, DCT) or sparsity. Preconditioning schemes and generator-space computations further reduce dimensionality (Chen et al., 2014, Kümmerle et al., 2018).

Algorithmic scalability:

Large-scale regression: CG-IRLS and PCG-accelerated variants maintain feasibility up to millions of variables (Fornasier et al., 2015).
Structured matrices: Generator-space IRLS for Hankel/block-Hankel structures achieves complexity $w_i^{(k)} = (|x_i^{(k)}|^2+\varepsilon^2)^{\alpha}$ 9 per randomized SVD and $\min_x \|A x - b\|_2^2 + \lambda \|x\|_1$ 0 for mat-vecs (Kümmerle et al., 2018).
Composite optimization: Multi-structure IRLS updates handle weights and surrogates in tandem with automatic parameter selection from current estimate, without needing explicit trade-off parameters (Kümmerle et al., 2023).
Neural network unrolling: IRLS steps can be recast as layers in recurrent networks, with convergence guarantees enabling implicit differentiation and memory-efficient training (Koshelev et al., 2023, Lefkimmiatis et al., 2023).

7. Limitations, Remedies, and Theoretical Developments

Failure modes: Classical IRLS can fail to converge at the critical sparsity order $\min_x \|A x - b\|_2^2 + \lambda \|x\|_1$ 1 under NSP, as explicitly constructed in (Aravkin et al., 2019). Parameter updates involving best-(K+1)-term error can remedy this and restore global and local linear convergence.
Nonconvexity: PL-IRLS extends IRLS to nonconvex nonsmooth settings (e.g., $\min_x \|A x - b\|_2^2 + \lambda \|x\|_1$ 2 for $\min_x \|A x - b\|_2^2 + \lambda \|x\|_1$ 3), and global convergence is ensured via the Kurdyka-Łojasiewicz property and sufficiently regularized surrogates (Zhang et al., 2014).
Smooth variable projection: Bilevel reformulations yield smooth objectives without spurious minima, admitting superlinear convergence via quasi-Newton methods and efficient inner linear solves (Poon et al., 2021).

References

(Lu et al., 2014): Smoothed Low Rank and Sparse Matrix Recovery by Iteratively Reweighted Least Squares Minimization
(Fornasier et al., 2015): Conjugate gradient acceleration of iteratively re-weighted least squares methods
(Kümmerle et al., 2017): Harmonic Mean Iteratively Reweighted Least Squares for Low-Rank Matrix Recovery
(Kümmerle et al., 2020): Iteratively Reweighted Least Squares for Basis Pursuit with Global Linear Convergence Rate
(Ene et al., 2 Oct 2025): Improved ℓ_{p} Regression via Iteratively Reweighted Least Squares
(Lerman et al., 25 Jun 2025): Global Convergence of Iteratively Reweighted Least Squares for Robust Subspace Recovery
(Koshelev et al., 2023): Iterative Reweighted Least Squares Networks With Convergence Guarantees for Solving Inverse Imaging Problems
(Kümmerle et al., 2023): Recovering Simultaneously Structured Data via Non-Convex Iteratively Reweighted Least Squares
(Kümmerle et al., 2018): Denoising and Completion of Structured Low-Rank Matrices via Iteratively Reweighted Least Squares
(Zhang et al., 2022): Iteratively Reweighte Least Squares Method for Estimating Polyserial and Polychoric Correlation Coefficients
(Chen et al., 2014): Fast Iteratively Reweighted Least Squares Algorithms for Analysis-Based Sparsity Reconstruction
(Voronin et al., 2015): An Iteratively Reweighted Least Squares Algorithm for Sparse Regularization
(Lefkimmiatis et al., 2023): Learning Sparse and Low-Rank Priors for Image Recovery via Iterative Reweighted Least Squares Minimization
(Zhang et al., 2014): Proximal linearized iteratively reweighted least squares for a class of nonconvex and nonsmooth problems
(Ene et al., 2019): Improved Convergence for $\min_x \|A x - b\|_2^2 + \lambda \|x\|_1$ 4 and $\min_x \|A x - b\|_2^2 + \lambda \|x\|_1$ 5 Regression via Iteratively Reweighted Least Squares
(Aravkin et al., 2019): IRLS for Sparse Recovery Revisited: Examples of Failure and a Remedy

IRLS constitutes a foundational algorithmic paradigm for structured recovery and regularization in high-dimensional inference, offering generalizable methodologies, rigorous theoretical foundations (often matching information-theoretic limits), scalable computation, and broad adaptability across statistical, optimization, and learning domains.