Papers
Topics
Authors
Recent
Search
2000 character limit reached

MRRR: Robust Eigenpair Computation

Updated 18 February 2026
  • Multiple Relatively Robust Representations (MRRR) is an algorithmic framework that efficiently solves symmetric tridiagonal eigenproblems using recursively generated robust representations.
  • It leverages a spectrum-shifting and task-queue paradigm to break eigenvalue clusters, achieving O(kn) computational complexity and excellent parallel scalability.
  • Mixed-precision implementations within MRRR enhance numerical stability and orthogonality, offering accuracy comparable to traditional methods while maintaining speed.

Multiple Relatively Robust Representations (MRRR) is an algorithmic framework for the efficient solution of the real symmetric tridiagonal eigenproblem (STEP), a key subproblem arising in the computation of eigenvalues and eigenvectors of dense Hermitian matrices after tridiagonal reduction. MRRR is distinguished by its use of recursively generated, carefully factored matrix representations—each being a "Relatively Robust Representation" (RRR)—to extract eigenpairs accurately and efficiently. Its primary contributions are a computational complexity of O(kn)O(kn) for kk eigenpairs of an n×nn\times n matrix (with knk \approx n in typical full-spectrum computations), modest memory requirements, and strong parallelizability. MRRR achieves these through a spectrum-shifting and task-queue paradigm that breaks clusters of close eigenvalues via recursively constructed RRRs. The method’s accuracy, performance, and scalability have been the subject of extensive theoretical and empirical investigation, especially in recent mixed-precision variants.

1. Mathematical Foundations and Classical Algorithm

The STEP requires the solution of

Tzi=λizi,zi2=1,λ1λnT z_i = \lambda_i z_i, \qquad \|z_i\|_2 = 1, \quad \lambda_1 \leq \dots \leq \lambda_n

for a real, symmetric, tridiagonal TRn×nT \in \mathbb{R}^{n \times n}. The MRRR approach is built upon the following core concepts:

  • Relatively Robust Representation (RRR): An RRR for a subset of eigenvalues indexed by I{1,,n}\mathcal{I} \subset \{1, \ldots, n\} is a factored form of a shifted tridiagonal,

M=TσI=LDLT or UΩUT or twisted/block variants,M = T - \sigma I = L D L^T \text{ or } U \Omega U^T \text{ or \textit{twisted/block variants}},

where, for any small element-wise relative perturbation x~ixikεxi|\widetilde x_i - x_i| \leq k\,\varepsilon\,|x_i|, the associated eigenvalues and invariant subspaces in I\mathcal{I} remain well-conditioned. Specifically,

λ~iλikrrnξλi,sin(Z~I,ZI)krrnξrelgap(I)|\widetilde\lambda_i - \lambda_i| \leq k_{rr}\,n\,\xi\,|\lambda_i|,\quad \sin \angle(\widetilde{\mathcal{Z}}_{\mathcal{I}}, \mathcal{Z}_{\mathcal{I}}) \leq \frac{k_{rr}\,n\,\xi}{\mathrm{relgap}(\mathcal{I})}

for moderate krrk_{rr} and small ξ\xi.

  • Algorithmic Structure: MRRR constructs a queue of "tasks," each consisting of a representation MM and index set I\mathcal{I}. Singleton tasks (with I=1|\mathcal{I}| = 1) yield eigenpairs via Rayleigh-quotient iteration (RQI) and twisted factorizations; clustered tasks (I>1|\mathcal{I}| > 1) trigger a shift, produce a new RRR, and subdivide the cluster by recursively building child tasks.

Algorithmic steps:

  1. Preprocessing: scale and split TT at tiny off-diagonals.
  2. Choose an initial root shift μ\mu, form Mroot=TμIM_{root} = T - \mu I, perturb.
  3. Compute eigenvalue estimates for MrootM_{root} to relative accuracy set by gaptolgaptol.
  4. Initialize a task queue with {Mroot,Iin,μ}\{ M_{root},\mathcal{I}_{in}, \mu \}.
  5. Iteratively process queue:
    • Partition I\mathcal{I} into clusters according to relgap\mathrm{relgap}.
    • For a singleton, execute RQI and back-shift.
    • For a cluster, select shift τ\tau, form new RRR MshiftedM_{shifted} and enqueue subtask.

The arithmetic complexity is O(kn)O(kn) for kk eigenpairs, with O(n)O(n) auxiliary storage, and the algorithm naturally supports both depth- and breadth-first traversals of the computation tree (Petschow et al., 2013, Petschow, 2014).

2. Theoretical Properties and Error Analysis

The quality and stability of MRRR depend on several analytical guarantees:

  • Residual and Orthogonality Bounds: If all RRRs meet relative robustness and conditional element growth, for unit roundoff ε\varepsilon and deepest cluster tree depth dmaxd_{max},

Mrootz^iλ^i[Mroot]z^i=O(r(local)+dmaxnεspdiam(T))\| M_{root} \hat z_i - \hat\lambda_i[M_{root}] \hat z_i \| = O( r^{(local)} + d_{max}\, n\, \varepsilon\, \mathrm{spdiam}(T) )

z^iTz^j=O(nε+n(ξ+ξ)dmaxgaptol)O(nε)|\hat z_i^T \hat z_j| = O \left( n \varepsilon + \frac{n(\xi_\downarrow + \xi_\uparrow) d_{max}}{gaptol} \right ) \approx O(n \varepsilon)

where gaptolgaptol sets cluster splitting sensitivity, and r(local)r^{(local)} is the local residual (cf. (Petschow et al., 2013, Petschow, 2014)).

  • Comparison to Other Methods: Classical MRRR yields worst-case orthogonality O(nε)O(n \varepsilon), as opposed to O(nε)O(\sqrt{n} \varepsilon) for QR and Divide–Conquer methods. In empirical tests, MRRR may lose orthogonality up to 103nε10^3 n \varepsilon, while alternatives remain below 10nε10 \sqrt{n} \varepsilon (Petschow et al., 2013).
  • Cluster Control via gaptolgaptol: Fine tuning gaptolgaptol manages the trade-off between breaking clusters (improving performance and parallelism) and orthogonality (smaller gaptolgaptol generally favored).

3. Mixed-Precision MRRR and Improved Accuracy

Mixed-precision innovations address MRRR’s classical orthogonality limitations:

  • Precision Levels: Input/output is in xx-bit (εx\varepsilon_x), internal sensitive steps in higher yy-bit precision (εyεx\varepsilon_y \ll \varepsilon_x). Examples: single → double or double → quadruple.
  • Critical Operations: All spectrum shifts, RRR constructions, and RQI are conducted in yy-bit to make ξ,ξ,α,η=O(εy)εx\xi_\downarrow, \xi_\uparrow, \alpha, \eta = O(\varepsilon_y) \ll \varepsilon_x. Non-critical routines (e.g., bisection for eigenvalue refinement) may remain in xx-bit to minimize performance cost (Petschow et al., 2013).
  • Enhanced Error Bounds: With appropriate gaptolmax{103,εyn/εx}gaptol \geq \max\{ 10^{-3}, \varepsilon_y \sqrt{n}/\varepsilon_x \} and typically shallow dmaxd_{max}, mixed-precision MRRR achieves

z^iTz^j=O(εxn)|\hat z_i^T \hat z_j| = O(\varepsilon_x \sqrt{n})

and

Mrootz^iλ^iz^i=O(r(local)+dmaxnεyspdiam(T))\| M_{root} \hat z_i - \hat\lambda_i \hat z_i \| = O( r^{(local)} + d_{max} n \varepsilon_y \mathrm{spdiam}(T) )

This matches the orthogonality of QR/Divide–Conquer methods while preserving the speed and scalability of MRRR (Petschow et al., 2013).

4. Scalability, Parallel Performance, and Implementation

MRRR is designed for high scalability in both shared-memory and distributed environments:

  • Multi-core (SMP) Parallelism: The algorithm is expressed in terms of independent tasks (singleton and cluster), enabling a shared FIFO queue processed by concurrent threads. Load balancing is enhanced via R-tasks which subdivide large clusters’ eigenvalue refinement (Petschow, 2014).
  • Distributed-Memory (MPI) Parallelism: Eigenpairs are distributed over pp processes (approximately n/pn/p per process). Each process operates on its set of indices, with minimal communication except when clusters span processes. Nonblocking MPI is used for necessary broadcasts and to overlap computation with communication (Petschow, 2014).
  • Memory and Data Layout: MRRR requires only O(nk)O(nk) workspace, substantially less than O(n2)O(n^2) needed by Divide–Conquer for explicit eigenvector backtransforms. Mixed-precision adds a negligible O(pn)O(pn) overhead (Petschow, 2014).
  • Performance Highlights:
    • On platforms like Intel Xeon X7550 (32 cores), single→double mixed-precision MRRR outperforms LAPACK’s single/double-precision MRRR ({\tt SSTEMR}, {\tt DSTEMR}) and is either faster or comparable to Divide–Conquer, with substantially improved orthogonality (Petschow et al., 2013).
    • In large dense eigensolvers (including the tridiagonal reduction stage), the mixed-precision tridiagonal solution is not rate limiting but produces vectors matching the quality of QR and DC (Petschow et al., 2013).
    • Parallel implementations such as PMRRR and MR3SMP achieve 80–90% efficiency scaling to thousands of cores (Petschow, 2014).

5. Algorithmic and Practical Considerations

Table 1 summarizes recommended parameter choices and practical guidelines for mixed-precision MRRR-based eigensolvers, as given in (Petschow et al., 2013):

Component Recommendation / Notes
gaptolgaptol max{103, εyn/εx} gaptol103\max\{10^{-3},\ \varepsilon_y \sqrt{n}/\varepsilon_x\}\ \leq gaptol \leq 10^{-3}; push lower for parallelism
RRR constants kelgmax{10, εx/(εyn)}k_{elg} \leq \max\{10,\ \varepsilon_x/(\varepsilon_y \sqrt{n})\}; krrmax{10,εx/(εyngaptol)}k_{rr} \leq \max\{10,\, \varepsilon_x/(\varepsilon_y \sqrt n\,gaptol)\}
RQI stopping r(local)gap×εxnr^{(local)} \leq \mathrm{gap} \times \varepsilon_x \sqrt{n}, set krs=1k_{rs}=1
Orthogonality Expect O(εxn)O(\varepsilon_x \sqrt{n}) with proper gaptolgaptol

Key workflow steps:

  1. Input: Read TT in xx-bit; convert to yy-bit for RRR work if yy-bit or mixed arithmetic not hardware-accelerated.
  2. Preprocessing: Scale/split in xx-bit.
  3. RRR Core: All sensitive operations in yy-bit; apply perturbation of magnitude O(εx)O(\varepsilon_x).
  4. Eigenvalue Refinement: Coarse and fine bisection in xx-bit.
  5. Main Loop: Queue tasks; for singletons, twisted-factorization RQI in yy-bit to r(local)r^{(local)}; for clusters, form child RRRs.
  6. Backconvert: Final z^i\hat z_i to xx-bit; enforce orthonormality if desired.

6. Comparative Landscape and Applicability

MRRR is contrasted with alternatives as follows:

  • Divide–Conquer (DC): O(n3)O(n^3) worst-case arithmetic cost, O(n2)O(n^2) storage, high BLAS-3 intensity; sometimes fastest for well-deflated, full-spectrum cases at moderate core counts (\leq16), but loses to MRRR for larger parallel jobs (Petschow, 2014).
  • QR Iteration: Also O(n3)O(n^3), less favorable for large nn or partial spectra.
  • Bisection+Inverse Iteration: O(kn+k2n)O(kn+k^2n) cost, poor scalability in the presence of eigenvalue clusters.
  • MRRR: O(kn)O(kn), minimal memory, scalable, especially advantageous when knk \ll n (partial spectrum), on massive or hybrid compute resources, or when orthogonality requirements demand mixed-precision (Petschow et al., 2013, Petschow, 2014).

Typical recommended scenarios for MRRR or mixed-precision MRRR:

  • Large-scale n5,000n \geq 5,000, where MRRR is fastest, and high-quality eigenvectors are needed for spectral clustering or as Rayleigh–Ritz bases.
  • Multi-threaded/distributed settings facing load imbalance from eigenvalue clustering; aggressively small gaptolgaptol disaggregates clusters and greatly enhances parallelism.
  • Emerging mixed-precision hardware where yy-bit operations are no longer performance limiting relative to xx-bit (Petschow et al., 2013).

7. Implementation Notes and References

  • Practical implementations utilize explicit factor storage (lower-bidiagonal L,DL,D or ee-representation), with recommended block sizes for dense stages (96–128 for n104..105n \sim 10^4..10^5 in Elemental or ScaLAPACK) (Petschow, 2014).
  • Careful attention to stable spectrum shifting (typically via DSTQDS/DQDS) is required to maintain element-wise mixed stability.
  • Depth-first recursion is particularly efficient in mixed-precision, as it collapses cluster depth to dmax1d_{max} \leq 1 (Petschow, 2014).

Principal references include:

  • Dhillon & Parlett, SIAM J. Matrix Anal. Appl. 2004 (original algorithm).
  • Willems & Lang, BIT 2011 (perturbation theory).
  • Bientinesi et al., PMRRR (parallel implementation).
  • Vömel, LAPACK Working Note 2010 (ScaLAPACK’s MRRR).
  • (Petschow et al., 2013) for mixed-precision methodology and extensive accuracy/performance data.
  • (Petschow, 2014) for multi-core/distributed algorithms, tuning, and comparative studies.

MRRR, particularly when augmented with mixed-precision techniques, now provides eigensolvers that are simultaneously scalable, accurate, and memory-efficient, enabling robust large-scale spectral computations in both traditional and emerging computational environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multiple Relatively Robust Representations (MRRR).