Papers
Topics
Authors
Recent
Search
2000 character limit reached

Randomized Sketch-and-Project Methods

Updated 14 November 2025
  • Randomized sketch-and-project methods are iterative algorithms that use random projections to efficiently solve linear systems, least squares problems, and matrix equations.
  • They achieve global Q-linear convergence in expectation by unifying techniques from classical methods such as Kaczmarz and coordinate descent with modern random matrix theory.
  • Selecting appropriate sketching matrices (e.g., Gaussian, SRHT, sparse sketches) is key to balancing computational cost and embedding guarantees for large-scale problems.

Randomized sketch-and-project methods refer to a broad and powerful class of iterative randomized algorithms for solving linear systems, least squares problems, linear feasibility, matrix equations, and related tasks in scientific computing. These algorithms leverage random projections (sketching) to reduce computational and storage complexity, while maintaining precise analytic control of convergence rates and solution quality. Developed and unified in the past decade, they bridge classical row- or coordinate-style methods such as Kaczmarz and coordinate descent, randomized block and Newton strategies, as well as modern subspace projection schemes tied to random matrix theory and optimal embedding guarantees.

1. Formulation and Unified Framework

Given a linear system Ax=bA x = b with ARm×nA \in \mathbb{R}^{m \times n}, randomized sketch-and-project methods proceed by iteratively projecting the current iterate xkx^k onto the solution set of a randomly sketched subsystem. In its most general form, the update takes the form: xk+1=argminxRnxxkW2subject to STAx=STb,x^{k+1} = \arg\min_{x \in \mathbb{R}^n} \|x - x^k\|_W^2 \quad \text{subject to } S^T A x = S^T b, where W0W \succ 0 is a user-chosen geometry (weight) matrix, SRm×qS \in \mathbb{R}^{m \times q} is a random sketching matrix (often qmq \ll m), and the norm is zW2=zTWz\|z\|_W^2 = z^T W z (Gower et al., 2015).

This update admits multiple equivalent interpretations:

  • Sketch-and-project: Project xkx^k in the WW-norm onto the affine subspace defined by the sketched equations.
  • Random update: Closed-form as

ARm×nA \in \mathbb{R}^{m \times n}0

  • Random linear solve: Solve a small projected problem.
  • Randomized intersect and fixed-point forms: The update corresponds to intersection of affine spaces and fixed-point contraction.

Key classical methods arise as special cases:

  • Randomized Kaczmarz (RK): ARm×nA \in \mathbb{R}^{m \times n}1, ARm×nA \in \mathbb{R}^{m \times n}2 [coordinate projection].
  • Randomized coordinate descent (CD): ARm×nA \in \mathbb{R}^{m \times n}3, ARm×nA \in \mathbb{R}^{m \times n}4, ARm×nA \in \mathbb{R}^{m \times n}5.
  • Randomized Newton/block Kaczmarz: Block selection for ARm×nA \in \mathbb{R}^{m \times n}6.
  • Gaussian Kaczmarz/pursuit: ARm×nA \in \mathbb{R}^{m \times n}7 (Gower et al., 2015, Gower, 2016).

For more complex problems such as matrix equations ARm×nA \in \mathbb{R}^{m \times n}8, (block) sketch-and-project methods project in a matrix-weighted Frobenius norm onto the solution set of doubly-sketched equations, yielding updates of the form: ARm×nA \in \mathbb{R}^{m \times n}9 where xkx^k0 are based on left and right sketches for xkx^k1 and xkx^k2 respectively (Bao et al., 2023).

2. Convergence Analysis and Rate Characterization

The fundamental convergence guarantee is global xkx^k3-linear convergence in expectation. Let xkx^k4 be the expected (random) update matrix, and define: xkx^k5 Under mild rank assumptions, xkx^k6 and

xkx^k7

(Gower et al., 2015, Gower, 2016). The rate xkx^k8 is governed by the worst-case contraction over the distribution of sketches, with block size and importance sampling distributions improving the bound. For the matrix equation xkx^k9,

xk+1=argminxRnxxkW2subject to STAx=STb,x^{k+1} = \arg\min_{x \in \mathbb{R}^n} \|x - x^k\|_W^2 \quad \text{subject to } S^T A x = S^T b,0

(Bao et al., 2023).

For iterative sketch-and-project applied to preconditioned least squares, the spectrum of the sketched Gram matrix xk+1=argminxRnxxkW2subject to STAx=STb,x^{k+1} = \arg\min_{x \in \mathbb{R}^n} \|x - x^k\|_W^2 \quad \text{subject to } S^T A x = S^T b,1 for xk+1=argminxRnxxkW2subject to STAx=STb,x^{k+1} = \arg\min_{x \in \mathbb{R}^n} \|x - x^k\|_W^2 \quad \text{subject to } S^T A x = S^T b,2 plays a pivotal role. Embedding and convergence probabilities can be sharply predicted in the high-dimensional, tall-data limit using tools from random matrix theory, in particular the Tracy–Widom law for the largest eigenvalue of a Wishart matrix (Ahfock et al., 2022).

3. Sketching Matrix Choices and Embedding Guarantees

The efficacy of sketch-and-project methods depends sensitively on the choice of random sketching matrix xk+1=argminxRnxxkW2subject to STAx=STb,x^{k+1} = \arg\min_{x \in \mathbb{R}^n} \|x - x^k\|_W^2 \quad \text{subject to } S^T A x = S^T b,3:

  • Gaussian: xk+1=argminxRnxxkW2subject to STAx=STb,x^{k+1} = \arg\min_{x \in \mathbb{R}^n} \|x - x^k\|_W^2 \quad \text{subject to } S^T A x = S^T b,4 i.i.d.
  • SRHT (Subsampled Randomized Hadamard Transform): xk+1=argminxRnxxkW2subject to STAx=STb,x^{k+1} = \arg\min_{x \in \mathbb{R}^n} \|x - x^k\|_W^2 \quad \text{subject to } S^T A x = S^T b,5 for xk+1=argminxRnxxkW2subject to STAx=STb,x^{k+1} = \arg\min_{x \in \mathbb{R}^n} \|x - x^k\|_W^2 \quad \text{subject to } S^T A x = S^T b,6 a Hadamard matrix, xk+1=argminxRnxxkW2subject to STAx=STb,x^{k+1} = \arg\min_{x \in \mathbb{R}^n} \|x - x^k\|_W^2 \quad \text{subject to } S^T A x = S^T b,7 random signs, and xk+1=argminxRnxxkW2subject to STAx=STb,x^{k+1} = \arg\min_{x \in \mathbb{R}^n} \|x - x^k\|_W^2 \quad \text{subject to } S^T A x = S^T b,8 uniform sampling.
  • Sparse sketches: Clarkson–Woodruff, CountSketch, LESS embeddings (leverage score sampling with xk+1=argminxRnxxkW2subject to STAx=STb,x^{k+1} = \arg\min_{x \in \mathbb{R}^n} \|x - x^k\|_W^2 \quad \text{subject to } S^T A x = S^T b,9 nonzeros per row) (Dereziński et al., 2022).

A sketch W0W \succ 00 is an W0W \succ 01-subspace embedding for W0W \succ 02 if

W0W \succ 03

or equivalently all eigenvalues of W0W \succ 04 are in W0W \succ 05.

The minimum sketch size W0W \succ 06 to achieve a given distortion and failure probability can be characterized asymptotically using Tracy–Widom fluctuations: W0W \succ 07 where W0W \succ 08 are centering and scaling constants and W0W \succ 09 is the Tracy–Widom(1) distribution (Ahfock et al., 2022). For block sizes SRm×qS \in \mathbb{R}^{m \times q}0, the required SRm×qS \in \mathbb{R}^{m \times q}1 can be much smaller than classical worst-case bounds.

4. Relationship to Randomized SVD and Low-Rank Approximation

The same sketching/projection operators in iterative algorithms also drive the analysis of randomized SVD and low-rank approximation. For projection operator SRm×qS \in \mathbb{R}^{m \times q}2,

SRm×qS \in \mathbb{R}^{m \times q}3

and the randomized SVD error

SRm×qS \in \mathbb{R}^{m \times q}4

The per-iteration convergence rate for sketch-and-project solvers is tightly lower bounded by

SRm×qS \in \mathbb{R}^{m \times q}5

with super-linear improvement when the spectrum of SRm×qS \in \mathbb{R}^{m \times q}6 decays rapidly (polynomial or exponential decay) (Dereziński et al., 2022). Sparse sketches (LESS, CountSketch, leverage-score sampling) retain the same rate up to SRm×qS \in \mathbb{R}^{m \times q}7 error where SRm×qS \in \mathbb{R}^{m \times q}8 is stable rank, even when the schematic density is radically reduced.

5. Asymptotic and Non-Asymptotic Rate Results

Classical non-asymptotic theory (e.g., [Tropp 2011]) gives that an SRm×qS \in \mathbb{R}^{m \times q}9-embedding is achieved with qmq \ll m0 or, for SRHT, qmq \ll m1. The Tracy–Widom-based theory delivers accurate, sharp predictions for empirical failure rates and convergence probabilities in the "tall and thin" regime, showing that:

  • Much smaller qmq \ll m2 often suffices in practice.
  • Empirical spectral distribution of distortion matches the qmq \ll m3 Tracy–Widom curve closely for qmq \ll m4.
  • Block, SLESS, or sub-Gaussian sketches behave nearly identically when qmq \ll m5 and leverage-scores of qmq \ll m6 are small (Ahfock et al., 2022, Dereziński et al., 2022).

6. Implementation Guidelines and Practical Strategies

Practical implementation proceeds by:

  1. Fixing the target distortion qmq \ll m7 and failure probability qmq \ll m8.
  2. Solving for the minimum qmq \ll m9 that guarantees the desired rate (using explicit Tracy–Widom or surrogate spectral bounds).
  3. Choosing zW2=zTWz\|z\|_W^2 = z^T W z0 as Gaussian, SRHT, or Clarkson–Woodruff/LESS embedding as appropriate for computational constraints.
  4. For iterative methods, simulating the theoretical rate curves to confirm sharpness and optimize trade-off between per-iteration cost and overall runtime.

Empirical evaluations show that the TW-based selection of zW2=zTWz\|z\|_W^2 = z^T W z1 is within 5% of the empirical optimum, and that sparse sketches (with zW2=zTWz\|z\|_W^2 = z^T W z2 nonzeros per row) offer the same convergence behavior as dense ones for large zW2=zTWz\|z\|_W^2 = z^T W z3 (Ahfock et al., 2022, Dereziński et al., 2022).

Numerical results on large-scale real world datasets (genetic data zW2=zTWz\|z\|_W^2 = z^T W z4; iterative least-squares on zW2=zTWz\|z\|_W^2 = z^T W z5) confirm matching of empirical embedding probabilities and theoretical rates for Gaussian, SRHT, and sparse sketches, and the failure of uniform sketching (lacking invariance to the Wishart limit).

7. Extensions, Limitations, and Comparisons

  • The sketch-and-project formalism subsumes a variety of classical and modern randomized iterative methods, including Kaczmarz, block Kaczmarz, coordinate descent, block Newton, random pursuit, and randomized matrix inversion methods via projection onto sketched constraints (Gower et al., 2015, Gower, 2016).
  • For block sizes approaching zW2=zTWz\|z\|_W^2 = z^T W z6, block and momentum-accelerated variants enable further empirical gains with efficient use of cache and parallel computation.
  • Uniform or leverage-score sketches that do not satisfy Wishart/pivot invariance may deviate from the Tracy–Widom predictions and typically underperform in high-coherence regimes.
  • The theoretical and experimental framework covers both one-shot sketching (randomized SVD, single-pass embedding) and online/iterative sketch-and-project methods, with rigorous non-asymptotic spectral and empirical convergence bounds.

In summary, randomized sketch-and-project methods provide a mathematically sharp, algorithmically versatile architecture for randomized linear algebra and optimization, where the interplay of random matrix theory, sketching constructions, and iterative projective updates yields both deep theoretical guarantees and robust, efficient large-scale solvers (Gower et al., 2015, Ahfock et al., 2022, Dereziński et al., 2022).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Randomized Sketch-and-Project Methods.