Generalized Nyström Decomposition

Updated 23 January 2026

Generalized Nyström Decomposition is a scalable low-rank approximation technique that extends the classical method to handle arbitrary, nonsymmetric, and infinite-dimensional data.
It employs sketching matrices and core spectral decompositions, leveraging efficient randomized algorithms to achieve near-optimal accuracy and rigorous error bounds.
Applications span approximate SVD, PCA, compressed kernel evaluation, and tensor factorization, offering computational efficiency and numerical stability in diverse settings.

The generalized Nyström decomposition is a broad family of algorithms for scalable @@@@1@@@@ of matrices, kernels, and operators. It generalizes the classical Nyström method, originally restricted to symmetric positive semidefinite (SPSD) kernel matrices, to encompass arbitrary rectangular, nonsymmetric, and even infinite-dimensional settings. Modern variants achieve near-optimal accuracy, provable error bounds, rigorous numerical stability, and efficient computational complexity for tasks such as approximate SVD, EVD, principal component analysis, compressed kernel evaluation, and matrix/tensor factorization. Central constructs include sketching matrices, sampled index subsets, and core matrices whose spectral decompositions are leveraged for interpolation.

1. Mathematical Formulation

The generalized Nyström decomposition approximates a given matrix $A \in \mathbb{R}^{m \times n}$ (not necessarily symmetric or square) by identifying index sets or sketching matrices that extract submatrices and project $A$ into lower-dimensional spaces. The foundation is the construction of a core matrix $A_M$ from sampled rows $I \subset \{1,\ldots,m\}$ and columns $J \subset \{1,\ldots,n\}$ , both of size $s$ , yielding $A_M = A(I,J) \in \mathbb{R}^{s \times s}$ . Auxiliary matrices $C = A(:,J)$ and $R = A(I,:)$ are also defined (Nemtsov et al., 2013).

The generalized rank- $s$ approximation has the canonical form: $\widehat{A} = C \, A_M^{+} \, R$ where $A_M^{+}$ is the Moore–Penrose pseudoinverse of the core. For kernel matrices, this reduces to the classical Nyström formula $K \approx C \, W^{+} \, C^\top$ with $W = K_M = K(I,J)$ (Wang et al., 2013, Arcolano et al., 2011). For tensor data, multilinear generalizations invoke mode-wise sketches and core tensors in the Tucker format (Bucci et al., 2023).

Generalized extensions allow for arbitrary sketching matrices $X \in \mathbb{R}^{n \times r}$ , $Y \in \mathbb{R}^{m \times (r+\ell)}$ (Gaussian, SRHT, etc.), yielding the factors

$C = A X, \qquad R = Y^\top A, \qquad W = Y^\top A X$

and forming $\widehat{A} = C W^{+} R$ (Nakatsukasa, 2020, Wang et al., 2024).

2. Algorithmic Workflow and Core Variants

Generalized Nyström algorithms proceed as follows (Nemtsov et al., 2013, Wang et al., 2024, Nakatsukasa, 2020):

Select or generate sketching matrices (or coordinate index sets) for the column and row spaces—these may be chosen randomly (e.g., Gaussian, SRHT), adaptively, or via leverage scores (Musco et al., 2016).
Compute the sampled factors $C$ , $R$ , and the core $A_M$ / $W$ .
Compute the spectral (SVD/EVD) decomposition of the core.
Interpolate singular vectors or eigenvectors back to the full space using explicit formulas, e.g.,

$\begin{align*} \text{Left singular}: & \quad \bar{U} = [U; F_M H \Lambda^{-1}] \ \text{Right singular}: & \quad \bar{W} = [H; B_M^\top U \Lambda^{-1}] \end{align*}$

for SVD (Nemtsov et al., 2013).

Form the low-rank approximation using the interpolated vectors and singular/eigenvalues.

Variants include:

Generalized column-only sketch (GN-c): single column sketch followed by two QR decompositions and a pseudo-inverse (Wang et al., 2024).
Recursive sampling with ridge leverage scores for kernel matrices, yielding linear scaling in $n$ and spectral-norm guarantees (Musco et al., 2016).
Pivoted sampling using RRQR for well-conditioned cores (Nemtsov et al., 2013).
Greedy nuclear-score-based selection for sparse/structured cases (Fornace et al., 2024).

3. Error Bounds and Theoretical Guarantees

Generalized Nyström methods enjoy explicit, tight error bounds in operator, Frobenius, and nuclear norms. For rectangular matrices $A$ and target rank $s$ , under reasonable conditioning assumptions (e.g., nonsingular core), the spectral-norm error is bounded by quantities involving the singular values of $A$ and those of $A_M$ (Nemtsov et al., 2013): $\|A - \widehat{A}\|_2 \leq \left(\frac{\sigma_{s+1}(A)}{\sigma_s(A_M)}\right) \left( \frac{\sigma_1(A)^2}{\sigma_s(A_{lg})} + 2\sigma_1(A) + \sigma_{s+1}(A) \right)$ where $A_{lg}$ is the best rank- $s$ SVD approximation.

Adaptive sampling schemes (with leverage scores, randomized low-rank preprocess, and residual sampling) attain relative-error bounds: $E[\|A - \widehat{A}\|_F^2] \leq (1+\epsilon) \|A - A_k\|_F^2$ for the best rank- $k$ approximation $A_k$ (Wang et al., 2013, Musco et al., 2016). Recursive ridge-leverage sampling guarantees $\|K - \widetilde{K}\|_2 \leq \lambda$ with sample size $s = O(d_{\text{eff}}^\lambda \log(d_{\text{eff}}^\lambda/\delta))$ (Musco et al., 2016).

QR-based rank reduction on the core attains the optimal nuclear-norm error among all fixed-rank approximants generated from the landmarks selected (Pourkamali-Anaraki et al., 2017). For infinite-dimensional trace-class operators, tight expectation and tail bounds are derived in Hilbert–Schmidt, operator, and trace norms (Persson et al., 2024).

4. Numerical Stability and Computational Complexity

Numerical stability is achieved by using QR/SVD-based pseudo-inverse computations on the core, often with explicit $\epsilon$ -thresholding to avoid amplification of floating-point error from ill-conditioned cores (Nakatsukasa, 2020, Bucci et al., 2023, 2610.11493). The error incurred from truncating small singular values is rigorously bounded.

Computational cost is minimized:

Dense matrices: $O(mn \log n + r^3)$ for the generalized algorithm with SRHT/structured sketches (Nakatsukasa, 2020).
Sparse matrices: $O(\text{nnz}(A)r)$ for coordinate sampling.
The entirety of the computation requires at most two passes over $A$ and limited memory proportional to $r$ , enabling streaming and out-of-core calculation.
For kernel matrices: $O(n s)$ kernel evaluations, $O(ns^2)$ additional runtime (Musco et al., 2016).
For adaptive sampling: $O(m k^2 \epsilon^{-4} + T(m^2k \epsilon^{-2}))$ (Wang et al., 2013).

5. Extensions: Inductive Maps, Tensors, and Infinite-Dimensional Settings

The decomposition is extended to inductive kernel methods, enabling out-of-sample extension and rapid evaluation for new points (Zhang et al., 2012). After learning the dictionary $S^*$ , arbitrary new data $x \in \mathbb{R}^d$ is mapped via $e(x) = [k(x,z_1),...,k(x,z_m)]^\top$ and embedded inductively as $\phi(x) = S^{*1/2}e(x)$ , facilitating O(m) evaluation complexity.

The multilinear Nyström decomposition generalizes the sketch/interpolate paradigm to tensors in the Tucker format. For each mode, sketch matrices $X_k$ , $Y_k$ are used, and the core tensor and mode-wise factors are computed in a single pass. Error bounds match the randomized HOSVD up to moderate constants, yielding efficient compression for $d$ -way arrays (Bucci et al., 2023).

Infinite-dimensional analogs for operators in separable Hilbert spaces use Gaussian process (GP) samples as sketch vectors and construct rank-constrained operator approximants $\widehat{\mathcal{A}} = \mathcal{A} \Omega G^+ (\mathcal{A}\Omega)^*$ , with expectation/tail bounds matching SVD lower bounds up to dimension-dependent quality factors (Persson et al., 2024).

6. Adaptive Sampling, Structure-Preserving, and CUR Generalization

Generalized Nyström extends as well to CUR decompositions, selecting rows and columns adaptively based on leverage scores, residuals, or nuclear objectives (Wang et al., 2013, Fornace et al., 2024). Greedy selection via nuclear scores maximizes trace reduction and yields exponentially tight bounds relative to determinantal point process (DPP) sampling: $1 - \frac{L_K(G_k)}{D_s(K)} < \exp\left(\frac{k}{1+s}\right)$ For subset selection and Laplacian reduction, structure-preserving variants handle null modes and sparsity efficiently.

7. Applications, Empirical Results, and Error Estimation

Generalized Nyström is foundational for large-scale kernel learning, streaming PCA, adaptive beamforming, image denoising, compressed graph Laplacian computation, and surrogate model construction in scientific computing (Arcolano et al., 2011, Bucci et al., 2023, Fornace et al., 2024). Empirical studies consistently demonstrate near-optimal accuracy, rapid runtimes, stability under ill-conditioning, and strong structure preservation.

To assess approximation accuracy without extra data passes, fast leave-one-out error estimators (LPO, LTO, LRO) are available. These express cross-validated error directly in terms of the already computed factors and core, and enable adaptive rank selection efficiently (Lazzarino et al., 16 Jan 2026).

In summary, the generalized Nyström decomposition synthesizes randomized and adaptive sampling, core compression, spectral interpolation, and rigorous error control to deliver versatile, efficient, and mathematically principled low-rank approximation algorithms for matrices, kernels, tensors, and operators across symmetric and nonsymmetric, finite and infinite-dimensional contexts (Nemtsov et al., 2013, Nakatsukasa, 2020, Wang et al., 2024, Persson et al., 2024).