MRRR: Robust Eigenpair Computation

Updated 18 February 2026

Multiple Relatively Robust Representations (MRRR) is an algorithmic framework that efficiently solves symmetric tridiagonal eigenproblems using recursively generated robust representations.
It leverages a spectrum-shifting and task-queue paradigm to break eigenvalue clusters, achieving O(kn) computational complexity and excellent parallel scalability.
Mixed-precision implementations within MRRR enhance numerical stability and orthogonality, offering accuracy comparable to traditional methods while maintaining speed.

Multiple Relatively Robust Representations (MRRR) is an algorithmic framework for the efficient solution of the real symmetric tridiagonal eigenproblem (STEP), a key subproblem arising in the computation of eigenvalues and eigenvectors of dense Hermitian matrices after tridiagonal reduction. MRRR is distinguished by its use of recursively generated, carefully factored matrix representations—each being a "Relatively Robust Representation" (RRR)—to extract eigenpairs accurately and efficiently. Its primary contributions are a computational complexity of $O(kn)$ for $k$ eigenpairs of an $n\times n$ matrix (with $k \approx n$ in typical full-spectrum computations), modest memory requirements, and strong parallelizability. MRRR achieves these through a spectrum-shifting and task-queue paradigm that breaks clusters of close eigenvalues via recursively constructed RRRs. The method’s accuracy, performance, and scalability have been the subject of extensive theoretical and empirical investigation, especially in recent mixed-precision variants.

1. Mathematical Foundations and Classical Algorithm

The STEP requires the solution of

$T z_i = \lambda_i z_i, \qquad \|z_i\|_2 = 1, \quad \lambda_1 \leq \dots \leq \lambda_n$

for a real, symmetric, tridiagonal $T \in \mathbb{R}^{n \times n}$ . The MRRR approach is built upon the following core concepts:

Relatively Robust Representation (RRR): An RRR for a subset of eigenvalues indexed by $\mathcal{I} \subset \{1, \ldots, n\}$ is a factored form of a shifted tridiagonal,

$M = T - \sigma I = L D L^T \text{ or } U \Omega U^T \text{ or \textit{twisted/block variants}},$

where, for any small element-wise relative perturbation $|\widetilde x_i - x_i| \leq k\,\varepsilon\,|x_i|$ , the associated eigenvalues and invariant subspaces in $\mathcal{I}$ remain well-conditioned. Specifically,

$|\widetilde\lambda_i - \lambda_i| \leq k_{rr}\,n\,\xi\,|\lambda_i|,\quad \sin \angle(\widetilde{\mathcal{Z}}_{\mathcal{I}}, \mathcal{Z}_{\mathcal{I}}) \leq \frac{k_{rr}\,n\,\xi}{\mathrm{relgap}(\mathcal{I})}$

for moderate $k_{rr}$ and small $\xi$ .

Algorithmic Structure: MRRR constructs a queue of "tasks," each consisting of a representation $M$ and index set $\mathcal{I}$ . Singleton tasks (with $|\mathcal{I}| = 1$ ) yield eigenpairs via Rayleigh-quotient iteration (RQI) and twisted factorizations; clustered tasks ( $|\mathcal{I}| > 1$ ) trigger a shift, produce a new RRR, and subdivide the cluster by recursively building child tasks.

Algorithmic steps:

Preprocessing: scale and split $T$ at tiny off-diagonals.
Choose an initial root shift $\mu$ , form $M_{root} = T - \mu I$ , perturb.
Compute eigenvalue estimates for $M_{root}$ to relative accuracy set by $gaptol$ .
Initialize a task queue with $\{ M_{root},\mathcal{I}_{in}, \mu \}$ .
Iteratively process queue:
- Partition $\mathcal{I}$ into clusters according to $\mathrm{relgap}$ .
- For a singleton, execute RQI and back-shift.
- For a cluster, select shift $\tau$ , form new RRR $M_{shifted}$ and enqueue subtask.

The arithmetic complexity is $O(kn)$ for $k$ eigenpairs, with $O(n)$ auxiliary storage, and the algorithm naturally supports both depth- and breadth-first traversals of the computation tree (Petschow et al., 2013, Petschow, 2014).

2. Theoretical Properties and Error Analysis

The quality and stability of MRRR depend on several analytical guarantees:

Residual and Orthogonality Bounds: If all RRRs meet relative robustness and conditional element growth, for unit roundoff $\varepsilon$ and deepest cluster tree depth $d_{max}$ ,

$\| M_{root} \hat z_i - \hat\lambda_i[M_{root}] \hat z_i \| = O( r^{(local)} + d_{max}\, n\, \varepsilon\, \mathrm{spdiam}(T) )$

$|\hat z_i^T \hat z_j| = O \left( n \varepsilon + \frac{n(\xi_\downarrow + \xi_\uparrow) d_{max}}{gaptol} \right ) \approx O(n \varepsilon)$

where $gaptol$ sets cluster splitting sensitivity, and $r^{(local)}$ is the local residual (cf. (Petschow et al., 2013, Petschow, 2014)).

Comparison to Other Methods: Classical MRRR yields worst-case orthogonality $O(n \varepsilon)$ , as opposed to $O(\sqrt{n} \varepsilon)$ for QR and Divide–Conquer methods. In empirical tests, MRRR may lose orthogonality up to $10^3 n \varepsilon$ , while alternatives remain below $10 \sqrt{n} \varepsilon$ (Petschow et al., 2013).
Cluster Control via $gaptol$ : Fine tuning $gaptol$ manages the trade-off between breaking clusters (improving performance and parallelism) and orthogonality (smaller $gaptol$ generally favored).

3. Mixed-Precision MRRR and Improved Accuracy

Mixed-precision innovations address MRRR’s classical orthogonality limitations:

Precision Levels: Input/output is in $x$ -bit ( $\varepsilon_x$ ), internal sensitive steps in higher $y$ -bit precision ( $\varepsilon_y \ll \varepsilon_x$ ). Examples: single → double or double → quadruple.
Critical Operations: All spectrum shifts, RRR constructions, and RQI are conducted in $y$ -bit to make $\xi_\downarrow, \xi_\uparrow, \alpha, \eta = O(\varepsilon_y) \ll \varepsilon_x$ . Non-critical routines (e.g., bisection for eigenvalue refinement) may remain in $x$ -bit to minimize performance cost (Petschow et al., 2013).
Enhanced Error Bounds: With appropriate $gaptol \geq \max\{ 10^{-3}, \varepsilon_y \sqrt{n}/\varepsilon_x \}$ and typically shallow $d_{max}$ , mixed-precision MRRR achieves

$|\hat z_i^T \hat z_j| = O(\varepsilon_x \sqrt{n})$

and

$\| M_{root} \hat z_i - \hat\lambda_i \hat z_i \| = O( r^{(local)} + d_{max} n \varepsilon_y \mathrm{spdiam}(T) )$

This matches the orthogonality of QR/Divide–Conquer methods while preserving the speed and scalability of MRRR (Petschow et al., 2013).

4. Scalability, Parallel Performance, and Implementation

MRRR is designed for high scalability in both shared-memory and distributed environments:

Multi-core (SMP) Parallelism: The algorithm is expressed in terms of independent tasks (singleton and cluster), enabling a shared FIFO queue processed by concurrent threads. Load balancing is enhanced via R-tasks which subdivide large clusters’ eigenvalue refinement (Petschow, 2014).
Distributed-Memory (MPI) Parallelism: Eigenpairs are distributed over $p$ processes (approximately $n/p$ per process). Each process operates on its set of indices, with minimal communication except when clusters span processes. Nonblocking MPI is used for necessary broadcasts and to overlap computation with communication (Petschow, 2014).
Memory and Data Layout: MRRR requires only $O(nk)$ workspace, substantially less than $O(n^2)$ needed by Divide–Conquer for explicit eigenvector backtransforms. Mixed-precision adds a negligible $O(pn)$ overhead (Petschow, 2014).
Performance Highlights:
- On platforms like Intel Xeon X7550 (32 cores), single→double mixed-precision MRRR outperforms LAPACK’s single/double-precision MRRR ({\tt SSTEMR}, {\tt DSTEMR}) and is either faster or comparable to Divide–Conquer, with substantially improved orthogonality (Petschow et al., 2013).
- In large dense eigensolvers (including the tridiagonal reduction stage), the mixed-precision tridiagonal solution is not rate limiting but produces vectors matching the quality of QR and DC (Petschow et al., 2013).
- Parallel implementations such as PMRRR and MR3SMP achieve 80–90% efficiency scaling to thousands of cores (Petschow, 2014).

5. Algorithmic and Practical Considerations

Table 1 summarizes recommended parameter choices and practical guidelines for mixed-precision MRRR-based eigensolvers, as given in (Petschow et al., 2013):

Component	Recommendation / Notes
$gaptol$	$\max\{10^{-3},\ \varepsilon_y \sqrt{n}/\varepsilon_x\}\ \leq gaptol \leq 10^{-3}$ ; push lower for parallelism
RRR constants	$k_{elg} \leq \max\{10,\ \varepsilon_x/(\varepsilon_y \sqrt{n})\}$ ; $k_{rr} \leq \max\{10,\, \varepsilon_x/(\varepsilon_y \sqrt n\,gaptol)\}$
RQI stopping	$r^{(local)} \leq \mathrm{gap} \times \varepsilon_x \sqrt{n}$ , set $k_{rs}=1$
Orthogonality	Expect $O(\varepsilon_x \sqrt{n})$ with proper $gaptol$

Key workflow steps:

Input: Read $T$ in $x$ -bit; convert to $y$ -bit for RRR work if $y$ -bit or mixed arithmetic not hardware-accelerated.
Preprocessing: Scale/split in $x$ -bit.
RRR Core: All sensitive operations in $y$ -bit; apply perturbation of magnitude $O(\varepsilon_x)$ .
Eigenvalue Refinement: Coarse and fine bisection in $x$ -bit.
Main Loop: Queue tasks; for singletons, twisted-factorization RQI in $y$ -bit to $r^{(local)}$ ; for clusters, form child RRRs.
Backconvert: Final $\hat z_i$ to $x$ -bit; enforce orthonormality if desired.

6. Comparative Landscape and Applicability

MRRR is contrasted with alternatives as follows:

Divide–Conquer (DC): $O(n^3)$ worst-case arithmetic cost, $O(n^2)$ storage, high BLAS-3 intensity; sometimes fastest for well-deflated, full-spectrum cases at moderate core counts ( $\leq$ 16), but loses to MRRR for larger parallel jobs (Petschow, 2014).
QR Iteration: Also $O(n^3)$ , less favorable for large $n$ or partial spectra.
Bisection+Inverse Iteration: $O(kn+k^2n)$ cost, poor scalability in the presence of eigenvalue clusters.
MRRR: $O(kn)$ , minimal memory, scalable, especially advantageous when $k \ll n$ (partial spectrum), on massive or hybrid compute resources, or when orthogonality requirements demand mixed-precision (Petschow et al., 2013, Petschow, 2014).

Typical recommended scenarios for MRRR or mixed-precision MRRR:

Large-scale $n \geq 5,000$ , where MRRR is fastest, and high-quality eigenvectors are needed for spectral clustering or as Rayleigh–Ritz bases.
Multi-threaded/distributed settings facing load imbalance from eigenvalue clustering; aggressively small $gaptol$ disaggregates clusters and greatly enhances parallelism.
Emerging mixed-precision hardware where $y$ -bit operations are no longer performance limiting relative to $x$ -bit (Petschow et al., 2013).

7. Implementation Notes and References

Practical implementations utilize explicit factor storage (lower-bidiagonal $L,D$ or $e$ -representation), with recommended block sizes for dense stages (96–128 for $n \sim 10^4..10^5$ in Elemental or ScaLAPACK) (Petschow, 2014).
Careful attention to stable spectrum shifting (typically via DSTQDS/DQDS) is required to maintain element-wise mixed stability.
Depth-first recursion is particularly efficient in mixed-precision, as it collapses cluster depth to $d_{max} \leq 1$ (Petschow, 2014).

Principal references include:

Dhillon & Parlett, SIAM J. Matrix Anal. Appl. 2004 (original algorithm).
Willems & Lang, BIT 2011 (perturbation theory).
Bientinesi et al., PMRRR (parallel implementation).
Vömel, LAPACK Working Note 2010 (ScaLAPACK’s MRRR).
(Petschow et al., 2013) for mixed-precision methodology and extensive accuracy/performance data.
(Petschow, 2014) for multi-core/distributed algorithms, tuning, and comparative studies.

MRRR, particularly when augmented with mixed-precision techniques, now provides eigensolvers that are simultaneously scalable, accurate, and memory-efficient, enabling robust large-scale spectral computations in both traditional and emerging computational environments.

Markdown Report Issue Upgrade to Chat

References (2)

Improved Accuracy and Parallelism for MRRR-based Eigensolvers -- A Mixed Precision Approach (2013)

MRRR-based Eigensolvers for Multi-core Processors and Supercomputers (2014)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multiple Relatively Robust Representations (MRRR).