Randomized Sketching + Power Iteration

Updated 9 February 2026

Randomized sketching plus power iteration is a technique for efficiently obtaining near-optimal low-rank approximations of matrices and tensors by iteratively amplifying dominant subspaces.
Its algorithmic variants—including block Krylov methods and two-sided tensor approaches with fast transforms—provide rigorous error bounds and improved convergence rates.
The methodology finds extensive application in streaming data analysis, PCA, and tensor decompositions, delivering practical performance gains and unified subspace estimation.

Randomized sketching combined with power iteration is a central methodology for scalable, high-accuracy low-rank matrix and tensor approximation. This paradigm enhances the effectiveness of random projections by iteratively amplifying dominant subspaces, resulting in both theoretical guarantees and practical algorithms with strong performance on large-scale data. Recent developments unify streaming, block, and transformed-domain approaches, extend to tensor decompositions, and include error certification per canonical angles. The following sections survey the principles, algorithms, error analyses, and major applications of this synthesis, as reflected in contemporary research.

1. Core Principles and Scope

The foundational objective is efficient, near-optimal low-rank approximation for matrices and tensors. Given a matrix $A\in\mathbb{R}^{n\times d}$ and target rank $k$ , the goal is to construct an explicit sketch or projection operator $S$ of lower dimension ( $n\times\ell$ or $\ell\times d$ , $\ell\geq k$ ) so that $A\approx S S^\top A$ optimally in spectral or Frobenius norm, i.e.,

$\min_S\; \|A - S S^\top A\|_\xi$

for $\xi\in\{2,F\}$ , with analogous forms for tensors. Randomized sketching leverages a random projection (e.g., Gaussian, SRHT, CountSketch), producing $Y_0 = A\Omega$ . Power iteration, or its block/Krylov variants, repeatedly multiplies by $A A^\top$ (or analogue), amplifying top singular directions: $Y_j = (A A^\top) Y_{j-1},\quad \text{or more generally,}~ K_q = [A\Omega, (A A^\top)A\Omega, \ldots, (A A^\top)^q A\Omega].$ The method extends beyond matrices to high-order data: tensor CP (Wang et al., 2015), Tucker decompositions (Dong et al., 2023), and transformed domain approaches (e.g., DCT-Gaussian-PI (Cheng et al., 2024)).

2. Algorithmic Designs and Variants

Randomized sketching plus power iteration appears in several algorithmic forms:

Simultaneous Power Iteration (SPI) and Block Krylov Iteration (BKI): Build increasingly accurate Krylov subspaces

$K=\bigl[A\Pi,\,A(A^\top A)\Pi,\ldots,(A A^\top)^t A\Pi\bigr]$

followed by QR and spectral projections (Musco et al., 2015).

Streaming and Deterministic Hybrids: The r-BKIFD algorithm fuses randomized Block Krylov compression (with oversampled random projections and $q$ power iterations per block) with Frequent Directions shrinkage in the streaming model (Wang et al., 2021).
Two-Sided and Transformed-Domain Methods for Tensors: DCT-Gaussian-Sketch-PI employs two-sided sketches and power iterations after transforming tensor “tubes” with 1D DCT, then performing block Krylov iterations in the transform domain (Cheng et al., 2024).
Sketch-and-Precondition via Inverse Iteration: The EPSI method sketched preconditioners (e.g., Nyström approximations), accelerating convergence in subspace iterations (Xu et al., 11 Feb 2025).
Power-Iterated Tucker and CP Decompositions: Multimode power iterations, with random sketches on each mode, systematically drive the core error to near-optimality in tensors (Dong et al., 2023, Wang et al., 2015).

Each approach is parametrized by block/sketch size ( $\ell$ , $p$ ), iteration count ( $q$ ), and, when appropriate, additional oversampling to capture numerical stability.

3. Theoretical Error Bounds and Spectral Guarantees

The combination of sketching and power iteration underlies strong, often gap-independent spectral bounds:

Block Krylov + Sketching achieves $(1+\epsilon)$ approximation in spectral norm in $O(\log d/\sqrt{\epsilon})$ iterations, improving the $O(1/\epsilon)$ rate of basic SPI (via Chebyshev polynomial acceleration) (Musco et al., 2015, Wang et al., 2015). The improvement comes from capturing all intermediate Krylov vectors and aligning better with the optimal subspace.
Streaming/Frequent Directions Extensions: For r-BKIFD, the main spectral guarantee is

$\|A - S S^\top A\|_2 \leq (1+\epsilon)\|A-A_k\|_2$

where $\epsilon=O(2^{-(2q+1)\min\{\sqrt{\gamma},1\}})$ , $\gamma$ depending on the singular value gap. Gap-independent convergence $q=O(\log d/\sqrt\epsilon)$ is achievable (Wang et al., 2021).

Posterior and Canonical Angle Bounds: Rigorous a priori and a posteriori bounds for the canonical (principal) angles between computed and true top- $k$ subspaces are now available, accounting for the balance between sketch size and power iterations (Dong et al., 2022).
Tensor Regimes: For DCT-Gaussian-Sketch-PI, the expected error after $q$ power iterations is controlled by the transformed-domain tail energy

$\mathbb{E}\|\mathcal{A}-\hat{\mathcal{A}}\|_F^2 \leq (1+f(k,s))\left(1+\frac{2\rho}{k-\rho-1}\right)\tau_{\rho+1}^2((\mathcal{A}\star_L\mathcal{A}^H)^{2q+1})$

indicating that the effective spectrum decays as $(2q+1)$ -th power (Cheng et al., 2024).

Sketch-and-Preconditioned Inverse Iteration: EPSI yields mixed linear–quadratic convergence with rate improving linearly in the sketch size (Xu et al., 11 Feb 2025).

4. Computational Costs, Trade-Offs, and Practical Regimes

Randomized sketching plus power iteration separates the computational cost into three main components: forming the sketch (matvecs, sometimes in transformed domains), iterative power/Krylov expansion, and small-scale SVD/QR factorization.

Per-iteration costs (e.g., for SPI/BKI) are $O(\operatorname{nnz}(A)kq + mk^2q)$ ; for block or transformed-tensor methods, additional Fourier/DCT costs may appear (Musco et al., 2015, Cheng et al., 2024).
Pass-efficiency: Block Krylov is more I/O efficient, requiring only $O(\log d/\sqrt{\epsilon})$ passes (Musco et al., 2015).
Oversampling and Block Size: Increased block/oversampling dimension $p$ (e.g., $p=k+10$ ) sharply reduces the error floor in the tail term (e.g., replacing $\sigma_{k+1}$ by $\sigma_{p+1}$ in the error expansion) (Wang et al., 2015).
Accuracy–speed trade-off: Even a single power iteration ( $q=1$ ) improves the approximation error by orders of magnitude over single-pass sketching, with only a modest increase in cost (Wang et al., 2021, Dong et al., 2023). This regime is empirically optimal for many real and synthetic datasets where spectral decay is moderate or slow.

5. Applications, Generalizations, and Empirical Outcomes

The methodology is pervasive across large-scale data processing:

Streaming and Online Learning: r-BKIFD provides a single-pass, streaming-efficient update with deterministic (Frequent Directions) accuracy (Wang et al., 2021).
Tensor Decompositions: CountSketch/FFT-based methods with power iterations enable robust, nearly input-size independent approximate CP and Tucker decompositions (Wang et al., 2015, Dong et al., 2023, Cheng et al., 2024).
Principal Component Analysis and Low-Rank SVD: Block Krylov and subspace iteration variants yield per-vector (PCA) guarantees and outperform classical power methods, especially as problem dimensionality grows (Musco et al., 2015).
Linear Systems and Eigenproblems: Power-iterated sketching is leveraged to accelerate Rayleigh-Ritz and GMRES-type algorithms by reducing the basis size via subspace embedding without loss of residual accuracy (Nakatsukasa et al., 2021).
Empirical results reveal:
- Block Krylov methods converge in substantially fewer steps (e.g., $5$–$10$ iterations compared to $30$–$50$ for SPI to reach $<1\%$ error) (Musco et al., 2015).
- Streaming r-BKIFD matches or surpasses alternatives in both error and speed, particularly for high-dimensional or sparse data (Wang et al., 2021).
- Two-sided DCT-Gaussian-PI outperforms one-sided and baseline tensor sketches in both accuracy and CPU time, narrowing the gap to full SVD decompositions (Cheng et al., 2024).

6. Practical Choices and Typical Guidelines

Effective deployment of randomized sketching plus power iteration requires:

Sketch Size $\ell, p$ : Moderate oversampling, e.g., $\ell\approx 1.5k$ , is usually sufficient for near-optimal subspace capture. Larger $\ell$ further reduces tail error (Dong et al., 2022, Wang et al., 2015).
Power Iterations $q$ : $q=1$ or $2$ power iterations are usually adequate, providing dramatic error reduction with little additional cost (Wang et al., 2021, Dong et al., 2023).
Block Size and Orthogonalization: Oversample block size $p$ , use block orthogonalization, and consider re-orthogonalization to control numerical stability (Musco et al., 2015).
Choice of Sketch Operator: Fast transforms (SRHT), CountSketch, and Gaussian sketches are all viable, with CountSketch often fastest for sparse matrices (Wang et al., 2021, Nakatsukasa et al., 2021).
Certification of Subspace Accuracy: Employ a posteriori canonical angle or residual bounds to certify accuracy post hoc; Monte Carlo estimators for canonical angle distributions are also practical (Dong et al., 2022).
Streaming or Block Processing: Algorithms such as r-BKIFD accommodate streaming or blockwise data, maintaining deterministic shrinkage properties (Wang et al., 2021).

7. Extensions and Open Challenges

Emerging research has pushed the paradigm in several directions:

Preconditioning Interpretation: Viewing sketching as a means to construct preconditioners within inverse iteration expands the application to more general iterative solvers and provides linear-in-sketch-size convergence rate improvements (Xu et al., 11 Feb 2025).
Transformed Domains and Tunable Products: By leveraging invertible transforms (DCT, DFT), sketching and power-iteration can be transplanted to transformed coordinate systems, accelerating convergence for structured tensors (Cheng et al., 2024).
Tensor Hashing and FFT Acceleration: For symmetric tensors, special colliding hashes and FFT-based contractions dramatically reduce runtime and memory in repeated power iterations (Wang et al., 2015).
Analysis of Per-Vector Guarantees: Theoretical progress includes matching per-vector (PCA) error guarantees, not just global spectral/Frobenius bounds (Musco et al., 2015).

A plausible implication is that future research will further unify sketch-preconditioned iterative solvers with kernel methods, expand canonical angle certification, and develop adaptive parameter selection for block size and power iterations based on runtime residuals and spectral decay estimation.

References

(Wang et al., 2021) An Improved Frequent Directions Algorithm for Low-Rank Approximation via Block Krylov Iteration
(Musco et al., 2015) Randomized Block Krylov Methods for Stronger and Faster Approximate Singular Value Decomposition
(Wang et al., 2015) Improved Analyses of the Randomized Power Method and Block Lanczos Method
(Dong et al., 2022) Efficient Bounds and Estimates for Canonical Angles in Randomized Subspace Approximations
(Dong et al., 2023) Practical Sketching Algorithms for Low-Rank Tucker Approximation of Large Tensors
(Wang et al., 2015) Fast and Guaranteed Tensor Decomposition via Sketching
(Cheng et al., 2024) An Efficient Two-Sided Sketching Method for Large-Scale Tensor Decomposition Based on Transformed Domains
(Xu et al., 11 Feb 2025) What is a Sketch-and-Precondition Derivation for Low-Rank Approximation? Inverse Power Error or Inverse Power Estimation?
(Nakatsukasa et al., 2021) Fast & Accurate Randomized Algorithms for Linear Systems and Eigenvalue Problems