Diagonal Plus Low-Rank Parameterization

Updated 26 January 2026

DPLR parameterization is a matrix modeling approach that decomposes a matrix into a diagonal (local variability) and a low-rank (global structure) component, improving interpretability and performance.
It finds applications in covariance estimation, precision matrices, operator approximations, and recommendation systems, offering scalable and efficient computational methods.
Advanced estimation techniques, including blockwise coordinate descent and convex optimization, ensure precise recovery with robust error bounds in high-dimensional settings.

The Diagonal-Plus-Low-Rank (DPLR) parameterization refers to the modeling, analysis, and estimation of matrices—typically covariance, precision, kernel, or operator matrices—as a sum of a diagonal matrix and a low-rank matrix. The core structural assumption is that the target matrix exhibits both (a) global shared structure (captured by the low-rank term) and (b) localized or per-coordinate variability (captured by the diagonal). This structure is ubiquitous in high-dimensional statistics, kernel learning, signal processing, matrix approximation, machine learning systems, and large-scale scientific computing.

1. Mathematical Formulation and Variants

DPLR decomposes a symmetric or Hermitian matrix $M \in \mathbb{R}^{n \times n}$ as

$M = D + L$

where $D$ is a diagonal matrix ( $D = \mathrm{diag}(d_1, \ldots, d_n)$ ), and $L$ is a low-rank positive semidefinite (PSD) matrix ( $\mathrm{rank}(L) \le r \ll n$ ). For general linear operators or kernel matrices, $L$ may be factored as $UV^T$ or $UU^T$ .

Several commonly encountered DPLR forms include:

Covariance estimation: $\Sigma_0 = L_\Sigma + D_\Sigma$ , where $M = D + L$ 0 is a population covariance, $M = D + L$ 1 PSD and low-rank, $M = D + L$ 2 diagonal and positive-definite (Wu et al., 2018).
Precision matrix (inverse covariance): $M = D + L$ 3, with $M = D + L$ 4 PSD, $M = D + L$ 5 diagonal and positive-definite (Wu et al., 2018).
Operator approximation: $M = D + L$ 6, with $M = D + L$ 7 (Fernandez et al., 28 Sep 2025).
Factor analysis: $M = D + L$ 8, $M = D + L$ 9 diagonal, $D$ 0 (Khamaru et al., 2018).

This parameterization generalizes both pure low-rank (global factor) and diagonal (coordinate-wise structure) models, and is strictly more expressive than either in isolation.

2. Identifiability and Convex Recovery

Identifiability within DPLR models depends crucially on the column-space of the low-rank term. Classical criteria for uniqueness of DPLR decomposition include:

Balance condition: In one dimension, a vector $D$ 1 is "balanced" if $D$ 2 for all $D$ 3; this is necessary and sufficient for recoverability (Saunderson et al., 2012).
Coherence threshold: For a subspace $D$ 4, the coherence $D$ 5; $D$ 6 is identifiable if $D$ 7 (Saunderson et al., 2012).

Convex optimization procedures such as Minimum Trace Factor Analysis (MTFA), which replaces direct rank minimization with trace minimization, recover DPLR decompositions under suitable identifiability and coherence conditions: $D$ 8 For moderate $D$ 9, SDPT3, SeDuMi, and MOSEK suffice; for larger $D = \mathrm{diag}(d_1, \ldots, d_n)$ 0, operator splitting and sketching methods exploit the structure (Saunderson et al., 2012).

The dual of this program is equivalent to searching for faces of the elliptope (the set of correlation matrices), directly linking DPLR recoverability, facial structure of $D = \mathrm{diag}(d_1, \ldots, d_n)$ 1, and geometric ellipsoid fitting (Saunderson et al., 2012).

3. Estimation Algorithms and Computational Procedures

DPLR estimation is implemented in a range of algorithms, including blockwise coordinate descent, spectral alternation, difference-of-convex optimization, and convex sketching.

Blockwise Coordinate Descent: Alternates closed-form $D = \mathrm{diag}(d_1, \ldots, d_n)$ $D = diag (d_{1}, \dots, d_{n})$ 2-update (eigendecomposition) and $D = \mathrm{diag}(d_1, \ldots, d_n)$ $D = diag (d_{1}, \dots, d_{n})$ 3-update (log-det SDP or coordinate optimization). For covariance estimation,
- $D = \mathrm{diag}(d_1, \ldots, d_n)$ 4-step: $D = \mathrm{diag}(d_1, \ldots, d_n)$ 5 with $D = \mathrm{diag}(d_1, \ldots, d_n)$ 6 from partial eigendecomposition.
- $D = \mathrm{diag}(d_1, \ldots, d_n)$ 7-step: minimizes $D = \mathrm{diag}(d_1, \ldots, d_n)$ 8 over diagonals (Wu et al., 2018).
Alternating Spectral Decomposition: Alternates between low-rank fit (Eckart–Young best approximation to $D = \mathrm{diag}(d_1, \ldots, d_n)$ 9) and diagonal update $L$ 0. Randomized variants (Nyström for low-rank, Diag++ for diagonal) achieve scalability in the matrix-vector-product setting with rigorous error bounds (Yeon et al., 18 Dec 2025).
Difference-of-Convex Factor Analysis (DC-FA): Rewrites the factor analysis ML estimator in terms of $L$ 1, where $L$ 2 is coordinatewise convex and $L$ 3 is a nonsmooth convex spectral function. Iterative majorization-minimization yields closed-form update rules per coordinate exploiting spectral subgradients (Khamaru et al., 2018).
Convex Sketched Estimation: In sketching-based recovery, SKETCHLORD solves

$L$ 4

via matrix sketches and nuclear norm regularization, attaining exact low-rank plus diagonal recovery in high-dimensional operator settings (Fernandez et al., 28 Sep 2025).

4. Theoretical Guarantees and Error Analysis

Theoretical properties of DPLR estimation encompass consistency, monotonicity, error bounds, and computational complexity:

Consistency of Estimators: For high-dimensional covariance estimation, when $L$ 5 truly decomposes as $L$ 6 with $L$ 7, both fixed-rank and rank-penalized estimators satisfy

$L$ 8

matching rates of sparse-precision estimators (Wu et al., 2018).

Monotone Frobenius Decrease: Alternating algorithms decrease the approximation error monotonically, with local contraction in neighborhoods of true parameters (Yeon et al., 18 Dec 2025).
Randomized Sketching Guarantees: Sketchlord achieves exact recovery for idealized DPLR matrices and outperforms any sequential diagonal-then-low-rank or low-rank-then-diagonal pipeline, provided sketch dimension is $L$ 9 (Fernandez et al., 28 Sep 2025). Non-asymptotic error bounds from matrix concentration theory apply.

5. Practical Implementations and Applications

DPLR parameterizations arise naturally in diverse application domains:

Covariance and Precision Estimation: DPLR estimators outperform graphical lasso in KL-divergence loss, especially when the true structure is low-rank plus diagonal. In financial portfolios (Markowitz allocation), DPLR-based covariance yields portfolios with lower volatility and higher Sharpe ratio than sparse or diagonal-only benchmarks (Wu et al., 2018).
Factorization Machines for Recommendation: DPLR-decomposed field-weight matrices in FwFM architectures reduce inference and training costs from $\mathrm{rank}(L) \le r \ll n$ 0 to $\mathrm{rank}(L) \le r \ll n$ 1, outperforming aggressive pruning in both accuracy and latency on large-scale ad ranking (Shtoff et al., 2024).
Eigenproblems and Polynomial Rootfinding: DPLR Hessenberg reduction via quasiseparable generators delivers $\mathrm{rank}(L) \le r \ll n$ 2 complexity solvers for secular companion linearizations, with norm-wise backward-stability and machine-precision eigenvalue accuracy (Bini et al., 2015).
Large-Scale PDEs and Filtering: DPLR Riccati projection enables tractable ODE integration of very large PSD flows, preserving invertibility and enabling linear-time inversion via Woodbury identity—crucial for high-dimensional Kalman filters and Bayesian inference (Bonnabel et al., 2024).
High-Dimensional Operator Sketching: Sketchlord enables fast, high-fidelity recovery of deep learning Hessians as DPLR objects from $\mathrm{rank}(L) \le r \ll n$ 3 matrix-vector products, matching empirical spectra and outperforming sequential approaches (Fernandez et al., 28 Sep 2025).

6. Methodological Considerations and Extensions

Optimization and estimation strategies for DPLR models reflect tradeoffs in computational complexity, rank selection, and regularization:

Rank selection: Penalized likelihood, cross-validation, and ablations on operator residuals are used to select rank $\mathrm{rank}(L) \le r \ll n$ 4.
Regularization: $\mathrm{rank}(L) \le r \ll n$ 5 or trace-norm penalties on both diagonals and low-rank factors stabilize training and estimation.
Structure extension: Methodologies generalize to block-diagonal plus low-rank, banded plus low-rank, or block-sparse plus factor models.
Dynamic adaptation: In ODE flows, one may adjust rank or diagonal structure dynamically based on error monitoring (Bonnabel et al., 2024).
Hyperparameter tuning: Sketch dimension, regularization weights, and learning rates are tuned by application-specific canary tests and cross-validation (Shtoff et al., 2024, Fernandez et al., 28 Sep 2025).

7. Comparative Analysis and Empirical Performance

Empirical studies consistently demonstrate that DPLR estimators, when the true matrix exhibits both global and local structure, outperform pure low-rank, pure diagonal, and sequential hybrid approximations. Key performance attributes include:

Accuracy: Lower approximation error and improved metrics (AUC, LogLoss, KL-loss) relative to comparator methods.
Scalability: Linear or near-linear scaling in ambient dimension $\mathrm{rank}(L) \le r \ll n$ 6 or number of fields $\mathrm{rank}(L) \le r \ll n$ 7, crucial for both scientific and commercial datasets.
Efficiency: Reduced memory and compute footprint, especially when $\mathrm{rank}(L) \le r \ll n$ 8, enabling deployment in latency-constrained data paths or massive-scale inference engines (Shtoff et al., 2024, Fernandez et al., 28 Sep 2025).
Numerical stability: DPLR methods maintain backward-stability and controlled error propagation in large-scale linear algebra (Bini et al., 2015).

The DPLR parameterization is thus foundational in modern high-dimensional statistics, scalable operator approximation, kernel learning, and recommendation systems, with well-understood identifiability, efficient computational frameworks, and empirical superiority in complex real-world applications.