Papers
Topics
Authors
Recent
Search
2000 character limit reached

Connecting randomized iterative methods with Krylov subspaces

Published 27 May 2025 in math.NA | (2505.20602v1)

Abstract: Randomized iterative methods, such as the randomized Kaczmarz method, have gained significant attention for solving large-scale linear systems due to their simplicity and efficiency. Meanwhile, Krylov subspace methods have emerged as a powerful class of algorithms, known for their robust theoretical foundations and rapid convergence properties. Despite the individual successes of these two paradigms, their underlying connection has remained largely unexplored. In this paper, we develop a unified framework that bridges randomized iterative methods and Krylov subspace techniques, supported by both rigorous theoretical analysis and practical implementation. The core idea is to formulate each iteration as an adaptively weighted linear combination of the sketched normal vector and previous iterates, with the weights optimally determined via a projection-based mechanism. This formulation not only reveals how subspace techniques can enhance the efficiency of randomized iterative methods, but also enables the design of a new class of iterative-sketching-based Krylov subspace algorithms. We prove that our method converges linearly in expectation and validate our findings with numerical experiments.

Summary

  • The paper presents a new affine subspace formulation that bridges low-memory randomized iterative methods and full-scale Krylov methods, demonstrating linear convergence.
  • It introduces IS-Krylov algorithms using projection-based updates that balance memory usage with computational efficiency.
  • Empirical evaluations show that increasing the memory parameter ℓ accelerates convergence and enhances robustness for large-scale, ill-conditioned systems.

Connecting Randomized Iterative Methods with Krylov Subspaces: An Expert Analysis


Introduction

This paper, "Connecting randomized iterative methods with Krylov subspaces" (2505.20602), investigates the foundational interplay between two classes of algorithms for large-scale linear systems: Randomized Iterative Methods (RIMs) and Krylov Subspace Methods (KSMs). The authors develop an affine subspace framework that naturally interpolates between these approaches by adaptively adjusting the history parameter \ell. Utilizing projection-based updates, they present theoretical analysis, efficient implementation schemes, and empirical justification, culminating in a class of iterative-sketching-based Krylov (IS-Krylov) algorithms.


Unified Affine Subspace Formulation

The authors define a unified iterative scheme for solving Ax=bAx = b with ARm×nA \in \mathbb{R}^{m\times n}:

xk+1=argminxΠkxAb22,x^{k+1} = \mathop{\arg\min}_{x \in \Pi_k} \|x - A^\dagger b\|_2^2,

where the affine subspace Πk\Pi_k is

aff{xjk,,xk, xkASkSk(Axkb)},\operatorname{aff}\left\{x^{j_k},\ldots,x^k, \ x^k - A^\top S_kS_k^\top (Ax^k - b)\right\},

with sketching matrices SkS_k drawn from a probability space, and jk=max{k+1,0}j_k = \max\{k-\ell+1, 0\}.

The truncation length \ell controls the memory and nature of the method:

  • =1\ell=1: Reduces to standard stochastic iterative schemes (e.g., Randomized Kaczmarz [strohmer2009randomized]), where only the most recent iterate and sketched gradient are used.
  • =\ell=\infty: Recovers a full Krylov subspace method, as the update utilizes the span of all past iterates and sketched gradients.

This interpolation builds a rigorous bridge between the randomized (low-memory, low-cost) updates and history-augmented, subspace-expanding Krylov schemes.


Algorithmic Realization and Complexity

Practical implementation is achieved via orthogonal projection calculations and leveraging structure for efficient computation. The core iteration reduces to solving a small, well-structured least-squares problem in the subspace generated by previous iterates and the current sketched normal.

Key elements:

  • Memory-efficient storage and update of the relevant subspace, exploiting affine independence.
  • Use of the Gram-Schmidt process and recursive matrix structures to economically compute the required orthogonal projections.
  • Complexity per iteration is linear in nn and, for fixed \ell, avoids the quadratic or cubic costs typical in classical KSMs.

Algorithms are presented in two forms: a direct projection (RIM-AS) and an orthogonalized basis expansion (IS-Krylov).


Theoretical Properties

Linear Convergence

A central theorem establishes that the proposed methods converge linearly in expectation:

Exk+1Ab22(1ρ)xkAb22,\mathbb{E} \|x^{k+1} - A^\dagger b\|_2^2 \leq (1 - \rho_\ell) \|x^{k} - A^\dagger b\|_2^2,

where ρ\rho_\ell is an explicit function of the properties of the sketching distribution and, crucially, grows (i.e., convergence accelerates) with \ell. For rank(A)\ell \geq \mathrm{rank}(A), the method attains exact solution in finite iterations without round-off.

Subspace Optimality and Acceleration

As \ell increases, the method has access to richer subspace information—incorporating more orthogonal components of the error—leading to strict improvement in the worst-case contraction. This framework formally subsumes and generalizes stochastic momentum (e.g., ASHBM [zeng2024adaptive]), inertial, and subspace-augmented stochastic approaches.

Algorithmic Equivalence

The authors prove the equivalence between the direct affine method and the IS-Krylov scheme, establishing that the two implementations produce identical iterates when the same sequence of sketching matrices is used, but with potentially different practical cost structures.


Connection to Krylov Subspace Methods

When the probability space degenerates to deterministic identity sketching and \ell \to \infty, the method reconstructs the classical KSM structure:

x0+Kk+1(AA, A(Ax0b)),x^0 + \mathcal{K}_{k+1}(A^\top A, \ A^\top(Ax^0 - b)),

where the iterates evolve in the Krylov subspace generated by the system normal equations.

More generally, with iterative sketching (e.g., random blocks, Gaussian, SRHT), the method yields a new class of "IS-Krylov" algorithms, where the subspace is stochastically refined at each iteration. This perspective produces stochastic analogs of classical algorithms (e.g., CGNE, GMRES) and naturally explains the effect of increasing memory (subspace dimension) on convergence.


Empirical Evaluation

Numerical experiments showcase:

  • Robust acceleration from larger \ell and appropriate block sizes qq.
  • IS-Krylov methods outperforming classical RIMs, momentum-based acceleration, and match or exceed standard direct solvers on sufficiently large problems.
  • The partition-based IS-Krylov (IS-Krylov-PS) is competitive in both iterations and wall-clock time, especially for large, sparse, or ill-conditioned systems.

Block and sketch type selection impact the tradeoff between iteration-complexity and per-iteration cost, and the authors offer detailed quantitative guidance.


Practical and Theoretical Implications

This framework unifies and broadens the landscape of scalable iterative solvers:

  • One can tune \ell for a spectrum of regimes: low-memory/high-speed (small \ell) and rapid-convergence/high-accuracy (large \ell).
  • Facilitates the design of hybrid stochastic-Krylov algorithms for high-dimensional or distributed systems, compatible with sketch-and-solve and other randomized NLA paradigms.
  • The framework applies even when AA is rank-deficient, over-/under-determined, or where only block sampling/sketching is available.

Theoretically, it provides insight into momentum, subspace, and history-based acceleration in randomized settings.


Future Directions

Key areas for extension include:

  • Theoretical characterization of the optimal \ell as a function of system properties and sketching scheme.
  • Extension to inconsistent or streaming systems (e.g., randomized coordinate descent as a dual analog).
  • Connections with variable-probability/structured sketching, preconditioning, and application to broader operator equations or nonlinear setups.

Conclusion

The paper delivers a methodologically rigorous and practically effective synthesis of randomized iterative and Krylov-subspace methods via an affine subspace framework. The interpolation between these domains reveals new algorithmic classes with provable and empirically validated convergence benefits. The resulting IS-Krylov strategies advance the state-of-the-art for large-scale, ill-conditioned, and structured linear systems.


Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

Overview

This paper is about solving big sets of linear equations, written as Ax = b. These show up everywhere—from machine learning to scientific computing. There are two popular ways to solve them:

  • Randomized iterative methods, like the randomized Kaczmarz method, which take quick, small steps using random pieces of the data.
  • Krylov subspace methods, like CG and GMRES, which use very smart directions and often converge faster.

The paper builds a clear bridge between these two worlds. It shows a single framework that can behave like either method (or a mix of both), and it explains why this works and how to do it efficiently.

Key Questions and Goals

In simple terms, the paper asks:

  • Can we connect fast, random step methods with smarter “subspace” methods?
  • Can we design a new method that uses the best parts of both?
  • Can we prove it gets you to the right answer quickly?
  • Can we implement it so each step is cheap and practical for large problems?

How the Method Works (with everyday analogies)

Think of solving Ax = b like trying to guess a secret number by asking questions. You start with a guess x^0. Then:

  1. You measure how wrong your current guess is. This “wrongness” is called the residual r = Ax - b.
  2. Instead of looking at the whole residual, you randomly “sketch” it using a small matrix S. This is like listening to a few helpful hints rather than reading the whole textbook.
  3. You build a small “working space” (an affine subspace) that includes:
    • Your recent guesses (you remember the last of them), and
    • A new correction direction pointing toward improvement, made from S and the residual.
  4. Inside that small space, you choose the point closest to the true solution. This is done by a projection, like dropping a perpendicular to find the closest point.

Key idea: the paper shows how to get this “closest point” without directly knowing the exact solution (which is called the pseudoinverse solution A^† b). That makes the method practical.

A few more helpful comparisons:

  • Randomized iterative method: taking quick steps using randomly sampled information.
  • Krylov subspace method: building a smart collection of directions (like a refined toolbox) by repeatedly using the matrix to improve your guess.
  • Memory length : how many past attempts you remember. More memory can lead to smarter steps.

Main Findings and Why They Matter

Here’s what the authors proved and demonstrated:

  • Linear convergence in expectation: On average, each step gets you closer by a fixed percentage. That means predictable, steady progress.
  • Using more memory helps: Increasing (remembering more past iterates) improves the convergence rate. In other words, remembering more good hints makes you learn faster.
  • Finite termination when ℓ ≥ rank(A): If you remember enough (at least as many as the matrix’s rank), the method finishes in a finite number of steps—no round-off errors assumed.
  • A unifying bridge:
    • If ℓ = 1, the method reduces to familiar randomized methods (like randomized Kaczmarz).
    • If ℓ = ∞ and S = I (no sketching), the method becomes a Krylov subspace method.
  • Efficient implementation: They designed an algorithm where each step is cheap to compute. The cost grows roughly linearly with the problem size and the number of remembered steps. This is key for large-scale problems.

They also propose a new class of iterative-sketching-based Krylov methods (“IS-Krylov”), which build the smart Krylov directions using fresh random sketches at each step. That adds flexibility and can reduce the amount of data processed per step.

Why This Is Important

  • Unification leads to new algorithms: By connecting the two families, we can design hybrid methods that are both fast per step (like randomized methods) and smart in direction choice (like Krylov methods).
  • Flexibility for big data: Using random sketches means you don’t have to look at the entire dataset in every step—great for large-scale problems in machine learning and scientific computing.
  • Practical and scalable: The method avoids expensive computations (like full matrix inverses) and uses limited memory (), making it suitable for real-world systems with millions of variables.

Simple Takeaway

Imagine you’re trying to find the best answer as quickly as possible:

  • Random methods give you cheap, quick nudges.
  • Krylov methods give you powerful, well-aimed pushes.
  • This paper shows how to combine both: take smart, quick nudges inside a small remembered space, and do it efficiently.
  • The result is a method that steadily gets closer to the right answer, faster if you remember more, and sometimes finishes in a fixed number of steps.

In short, the paper provides a clear, unified way to solve big systems of equations more efficiently, with strong theory and practical algorithms that can make a real difference in modern applications.

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a consolidated list of the main unresolved issues and opportunities for further research identified in the paper. Each item is phrased to be concrete and actionable for future work.

  • Extension beyond consistent systems: Theoretical guarantees and algorithmic modifications for inconsistent or noisy linear systems (least-squares setting) are missing; key lemmas used (e.g., equivalence of nonzero sketched residual and sketched normal vector) rely on consistency and do not directly generalize.
  • Quantifying and optimizing the convergence factor: The rate depends on the term σ_min(H{1/2}A) and the memory-dependent factor q_k, but there are no explicit lower bounds or closed-form estimates for these quantities under common sketching distributions (row/block sampling, Gaussian, SRHT, CountSketch). Deriving distribution-specific bounds and design rules to maximize q_k is an open problem.
  • Acceptance probability and rejection sampling cost: The method resamples until S_kT(Axk−b) ≠ 0, yet the expected number of trials and its impact on per-iteration complexity are not analyzed. Bounding P(Q_k) and designing Ω to guarantee high acceptance rates would make the method more predictable.
  • Adaptive selection of the memory parameter ℓ: There is no principled strategy for choosing or adapting ℓ online to balance memory, runtime, and convergence speed. Criteria based on curvature/residual trends or saturation detection could improve efficiency.
  • Stability and breakdown safeguards: The efficient update uses a Schur complement and the denominator c_k − w_kT h_k; the paper does not analyze near-breakdowns or numerical instabilities (e.g., when this denominator approaches zero) nor propose safeguards (regularization, reorthogonalization, restarts).
  • Exploiting structure in C_k: The matrix C_k is tridiagonal, but the complexity accounting treats h_k = C_k w_k as O((k−j_k)2) flops. Leveraging tridiagonal structure should reduce this to O(k−j_k); deriving and implementing true linear-time updates is an open improvement.
  • Preconditioning: No preconditioned variants (left/right or normal-equation preconditioning) are developed. Introducing preconditioning into both the randomized and IS-Krylov formulations and analyzing its effect on σ_min(H{1/2}A) and q_k could substantially accelerate convergence.
  • Block size q and sketch design: The influence of sketch dimension q on convergence and complexity is not quantified. Guidelines for choosing q (and the sketch type) under sparsity/density constraints to optimize the runtime–accuracy trade-off are needed.
  • Impact of finite precision: All orthogonality and affine independence arguments assume exact arithmetic. There is no analysis of loss of orthogonality, drift in subspace quality, or remedies (selective reorthogonalization, basis conditioning) in finite precision.
  • Full specification and guarantees for IS-Krylov: The proposed IS-Krylov algorithm (Algorithm 3) is incomplete in the paper, with missing update rules and no convergence guarantees. A complete description (including p_k updates, orthogonalization, restarts) and theoretical analysis (residual minimization properties, rates) remain to be provided.
  • Precise connection to CG/GMRES: While the paper claims recovery of Krylov methods when ℓ = ∞ and Ω = {I}, it does not rigorously show when the directions become conjugate (CG for SPD) or when residual minimization (GMRES) is achieved. Formal equivalence conditions and cases where deviations occur are open.
  • Stopping criteria: The algorithm references a stopping rule but does not propose robust criteria (e.g., residual-based, error-based, or probabilistic) or analyze their behavior under noise/inconsistency.
  • Exploiting sparsity and data locality: Complexity accounts use generic costs W(S_kT A) and W(S_kT b). Concrete sparse implementations (row/block sketches, cache-aware updates, incremental residual maintenance) and their complexity gains are not detailed.
  • Distribution design over Ω or {Ω_k}: The paper suggests using time-varying probability spaces but provides no principled design (e.g., leverage-score sampling, adaptive residual-focused sampling, mixing strategies) or convergence analysis under nonstationary Ω_k.
  • Global complexity and comparisons: There is no iteration-complexity bound (iterations-to-accuracy) as a function of spectrum, ℓ, q, and sketch distribution, nor theoretical or empirical comparisons against RK, block RK, CG, GMRES, ASHBM, and Gearhart–Koshy accelerations on standard benchmarks.
  • Handling rank-deficiency and underdetermined systems: Although finite termination for ℓ ≥ rank(A) is shown in exact arithmetic, the behavior for rank-deficient, underdetermined, or nearly rank-deficient systems (including detection and mitigation of ill-conditioning) is not analyzed.
  • Robustness to round-off error in finite termination claims: The finite termination assertion (for ℓ ≥ rank(A)) lacks a finite-precision analysis. Conditions and practical strategies to retain fast termination (e.g., basis conditioning, scaling, restarts) are open.
  • Objective variants and regularization: Extensions to regularized problems (ridge/Tikhonov), constrained least squares, or alternative projection targets (e.g., (AT A + λI)−1AT b) are not explored; integrating regularization into the subspace-projection framework is an opportunity.
  • High-probability and tail bounds: Convergence is shown in expectation, but high-probability guarantees, concentration results, and robustness to heavy-tailed sketches/residuals are missing.
  • Empirical reproducibility: The numerical experiments are referenced but not fully specified in the text provided (data, parameter settings, sketch types, ℓ and q choices). A reproducible evaluation suite with ablations isolating the effects of ℓ, q, and Ω would strengthen practical guidance.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.