Cholesky Factor Quantization

Updated 27 January 2026

Cholesky factor quantization is the process of representing and computing Cholesky factors in low-precision (fp16) arithmetic for SPD systems while managing rounding errors and overflow.
The approach integrates symmetric diagonal scaling, look-ahead, and global shift strategies to stabilize incomplete Cholesky factorizations and mitigate quantization-induced breakdowns.
Mixed-precision iterative refinement leveraging fp16-based preconditioners demonstrates full double-precision accuracy with significant memory and computational efficiency for large, ill-conditioned matrices.

Cholesky factor quantization refers to the representation and computation of Cholesky factors in reduced-precision (quantized) floating-point arithmetic, with particular attention to the robustness and numerical stability of incomplete Cholesky (IC) factorizations in very low precision such as IEEE-754 half precision (fp16). This approach is especially relevant to large-scale, ill-conditioned, symmetric positive definite (SPD) linear systems. The central challenge involves maintaining the effectiveness of IC-based algebraic preconditioners while avoiding breakdowns and loss of preconditioner quality due to quantization-induced rounding error and overflow in limited-precision formats. Recent advances address these difficulties using algorithmic modifications tailored to the unique properties of half-precision arithmetic (Scott et al., 2024).

1. Incomplete Cholesky Factorization and Half-Precision Quantization

Given a symmetric positive definite matrix $A \in \mathbb{R}^{n \times n}$ , the incomplete Cholesky factorization computes a sparse lower-triangular matrix $L$ such that $A \approx L L^T$ , restricted to a prescribed sparsity pattern $\mathcal{S}\{L\}$ . Entries are computed via outer-product updates, analogous to the complete Cholesky factorization but with dropping of fills outside $\mathcal{S}\{L\}$ .

In the quantized setting, every real $x$ is replaced by $\mathrm{fl}_{1/2}(x) = x(1+\delta)$ for $|\delta| \leq u_{1/2}$ , where $u_{1/2} = 2^{-11} \approx 4.88 \times 10^{-4}$ in fp16. Overflow to $\pm\infty$ occurs when magnitudes exceed $x_{\max} \approx 6.55 \times 10^{4}$ . Thus, quantization of the Cholesky factor $L$ is modeled by applying $\mathrm{fl}_{1/2}(\cdot)$ after every arithmetic operation and monitoring intermediates for overflow. This structural quantization is integral to characterizing algorithmic breakdowns and the behavior of preconditioners in fp16.

2. Prescaling for Robustness in Low-Precision Arithmetic

Symmetric diagonal scaling is a classical preconditioning step to reduce the adverse effects of low-precision on numerical stability. By defining $D = \mathrm{diag}(d_1^{-1/2}, \dots, d_n^{-1/2})$ with $d_i = \lVert \mathrm{row}_i(A)\rVert_2$ , the scaled matrix $\widehat{A} = DAD$ satisfies $\kappa_2(\widehat{A}) \leq \kappa_2(A)$ and $\max_{i,j}|\widehat{A}_{ij}| \leq 1$ , where $\kappa_2$ denotes the spectral condition number. Performing the factorization on $\widehat{A}$ yields $\widehat{L}$ with better bounded entries, so that the quantized factors ( $\widehat{L}$ in fp16) suffer less entry growth and fewer overflows. The preconditioner for the original system is then constructed as $L = D\widehat{L}$ , ensuring the effectiveness of scaling strategies for mitigating quantization effects (Scott et al., 2024).

3. Breakdown Avoidance: Look-Ahead and Global Shift Strategies

Breakdowns in IC factorization under quantization primarily manifest as:

B1: Small or negative pivots ( $l_{kk} < \tau_u$ ), critical for fp16 computations where $\tau_u = 10^{-5}$ .
B3: Overflow during computation (e.g., due to excessive entry growth).

Look-ahead proactively evaluates would-be pivots $\widetilde{\ell}_{jj}$ before committing to updates, using only safe fp16 operations. If any $\widetilde{\ell}_{jj} < \tau_u$ for steps $j \geq k$ , a B1-breakdown is declared preemptively at step $k$ , preventing unnecessary computation and wasted work.

The global shift remedy replaces $\widehat{A} \rightarrow \widehat{A} + \sigma I$ , bumping all diagonals and ensuring all pivots remain strictly positive as long as $\sigma$ exceeds the most negative would-be pivot. Employed iteratively, with an initial shift $\alpha_S \approx 10^{-3}$ and doubling on continued breakdown, this robust technique stabilizes the IC process but requires care not to degrade preconditioner quality with excessive shifting.

4. Optimization-Inspired Local Modification (GMW $(\beta)$ ) and Entry-Growth Control

The Gill-Murray-Wright (GMW) strategy, originally from modified Cholesky factorizations in numerical optimization, applies a local pivot modification: $l_{kk} \leftarrow \max\left\{ l_{kk}, \left(\frac{l_{k,\max}}{\beta}\right)^2 \right\},$ where $l_{k,\max} = \max_{i > k} |l_{ik}|$ and $\beta > 0$ is a user-selected parameter that governs entry growth. This ensures that no off-diagonal in column $k$ becomes disproportionately large after subsequent scaling. The method can be fused with look-ahead, and if a safe modification is not possible (i.e., would cause overflow in fp16), a new breakdown "B4" is flagged.

Theoretical entry-growth bounds are established by Lemma 3.1: if all previous columns satisfy

$|a_{ij}| + \min\{\mathrm{nz}(i), \mathrm{nz}(j)\}\,\beta^2 \leq x_{\max}, \quad \forall (i,j)\in \mathcal{S}\{L\},$

with $\mathrm{nz}(i)$ the number of nonzeros in row $i$ (columns $1, \dots, k-1$ ), then no overflow (B3) can occur at step $k$ . Selection of $\beta$ thus directly regulates the quantization safety of the factorization.

Once an IC factor $L$ has been computed in half precision, it can be recast in double precision for use as a preconditioner in a GMRES-based iterative refinement scheme. The five-precision GMRES-IR (as described by Carson & Higham) proceeds as follows:

Compute $L$ in fp16.
Initialize $x^{(0)}=0$ .
Iterate:
- Compute residual $r = b - Ax^{(m)}$ in fp64.
- Solve $Ad = r$ via preconditioned GMRES (with preconditioners and matvecs in fp64 or fp32) to tolerance $\lVert r \rVert \leq u_{64}^{1/4}$ .
- Update $x^{(m+1)} = x^{(m)} + d$ .
- Terminate when backward error satisfies
$\frac{\|b-Ax\|_\infty}{\|A\|_\infty \|x\|_\infty + \|b\|_\infty} \leq 10^{3} u_{64}.$

Numerical results indicate that this mixed-precision approach, starting from an fp16 IC factor, achieves full double-precision accuracy with rapid convergence (typically 2–10 outer iterations and low hundreds of total GMRES steps) provided breakdowns are avoided during the IC computation (Scott et al., 2024).

6. Empirical Evaluation and Practical Outcomes

Experiments on 15 SPD matrices ( $n$ from $10^3$ to $2.6 \times 10^5$ ; condition numbers $10^8$ – $10^{16}$ ; densities $0.1\%$ –1%) show:

Symmetric $l_2$ -scaling and dropping entries $|a_{ij}|<10^{-5}$ (for fp16).
Level-2 and level-3 IC( $\ell$ ) preconditioners were assessed in fp16 and fp64 with combinations of no look-ahead, look-ahead, global shift ( $\sigma$ doubling), and GMW( $\beta$ ) for $\beta = 0.5, 10, 100$ .

Major findings include:

Without look-ahead in fp64, some IC factors exhibited massive entry growth and yielded ineffective preconditioners; look-ahead eliminated such hidden breakdowns.
In fp16, all breakdowns were B1 unless look-ahead was omitted, in which case catastrophic B3 overflows also manifested.
The global shift strategy efficiently recovered usable factors with iteration counts comparable to fp64 preconditioners.
GMW( $\beta$ ), with small $\beta \sim 0.5$ , avoided breakdowns but resulted in weaker preconditioners (larger GMRES counts); values $\beta \sim 50–100$ balanced modification frequency and solver performance.
In all successful fp16-based schemes for mixed-precision iterative refinement, double-precision accuracy was reliably attained, typically requiring only twice as many GMRES steps as an fp64-based preconditioner (Scott et al., 2024).

7. Significance and Application Domain

Cholesky factor quantization in half precision provides substantial memory and (potential) speed benefits, particularly for large-scale, sparse SPD systems addressed in scientific computing and optimization. The reliability of fp16-based IC preconditioners—when enhanced with prescaling, look-ahead, and global/local modifications—enables their deployment in mixed-precision iterative solvers, facilitating high-accuracy solutions without sacrificing the resource efficiency conferred by quantization. This research outlines a robust algorithmic toolkit ensuring breakdown avoidance and preconditioner efficacy under quantized floating-point arithmetic, with demonstrated practical utility in challenging, ill-conditioned matrix regimes (Scott et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

Developing robust incomplete Cholesky factorizations in half precision arithmetic (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cholesky Factor Quantization.

Cholesky Factor Quantization

1. Incomplete Cholesky Factorization and Half-Precision Quantization

2. Prescaling for Robustness in Low-Precision Arithmetic

3. Breakdown Avoidance: Look-Ahead and Global Shift Strategies

4. Optimization-Inspired Local Modification (GMW $(\beta)$ ) and Entry-Growth Control

5. Mixed-Precision Iterative Refinement

6. Empirical Evaluation and Practical Outcomes

7. Significance and Application Domain

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Cholesky Factor Quantization

1. Incomplete Cholesky Factorization and Half-Precision Quantization

2. Prescaling for Robustness in Low-Precision Arithmetic

3. Breakdown Avoidance: Look-Ahead and Global Shift Strategies

4. Optimization-Inspired Local Modification (GMW(β)(\beta)(β)) and Entry-Growth Control

5. Mixed-Precision Iterative Refinement

6. Empirical Evaluation and Practical Outcomes

7. Significance and Application Domain

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

4. Optimization-Inspired Local Modification (GMW $(\beta)$ ) and Entry-Growth Control