Retraction-Free Orthonormality-Preserving Schemes

Updated 21 January 2026

Retraction-free methods are algorithmic frameworks that maintain orthonormality without explicit QR, SVD, or manifold retractions.
They employ techniques like penalty functions, analytic parameterizations, component-wise updates, and recursive Gram-Schmidt steps to ensure asymptotic constraint enforcement.
These schemes offer computational advantages with reduced complexity, GPU-friendly implementations, and strong convergence guarantees in large-scale optimization.

A retraction-free orthonormality-preserving scheme is any algorithmic framework for constrained optimization or numerical simulation on orthogonality- or Stiefel-constrained sets (e.g., $X^\top X = I$ , $Q^\top Q = I$ , or $\langle \psi_i, \psi_j \rangle = \delta_{ij}$ ) which guarantees preservation or asymptotic enforcement of orthonormality constraints without invoking explicit manifold retractions, QR updates, or SVD-based projections at each iteration. These approaches are broadly motivated by the prohibitive computational cost, lack of GPU-friendliness, and complexity overhead associated with classical retraction steps in Riemannian optimization and manifold-based numerical PDEs. Retraction-free methods leverage penalty functions, analytic parameterizations, sequential updates, and tailored variational structures to efficiently achieve feasibility, either exactly or in the limit.

1. Algorithmic Principles and Taxonomy

Retraction-free orthonormality-preserving schemes can be classified according to the underlying mechanism used to enforce the constraints:

Penalty-based landing field methods: Iterates are driven by the sum of a Riemannian gradient and a penalty gradient targeting feasibility (e.g., $\Lambda(X) = \operatorname{grad} f(X) + \lambda X(X^\top X - I)$ ) (Sun et al., 2024, Song et al., 3 Jun 2025).
Component-wise constrained splitting: Component updates are architected so that projection onto the feasible set occurs through algebraic or variational manipulations, with the update formula itself encoding orthonormality (Wang et al., 2023, Zhang et al., 13 Jan 2026).
Analytic parameterizations: Orthogonal matrices are represented by unconstrained parameters, such as unit lower-triangular factors (PLR decomposition) or sequences of Givens rotations, and optimized without additional projection (Bagnato et al., 2019, Shalit et al., 2013).
Exact penalty and constraint-dissolving maps: Constraint satisfaction is enforced by augmenting the objective with a strongly weighted penalty term and operating over a convex superset, with iterates projected only onto the superset; feasibility is recovered asymptotically (Aybat et al., 24 Oct 2025).
Recursive Gram-Schmidt updates: Only local orthonormalization with respect to the newest chosen vector is applied to the current set, avoiding global retractions (Xue et al., 2015).

A defining property is that at no stage is a global QR or SVD applied to the iterate to restore feasibility; instead, the scheme guarantees feasibility through its update structure, either exactly or up to machine epsilon.

2. Representative Algorithms and Analytic Parametrizations

Several prototype algorithms are canonical in the literature.

eOMP—Recursive Orthonormalization for Sparse Coding:

In "eOMP: Finding Sparser Representation by Recursively Orthonormalizing the Remaining Atoms" (Xue et al., 2015), each iteration applies a rank-one orthonormalization (single Gram-Schmidt step) to all remaining candidate atoms: $\psi_j^{(t)} = \frac{\psi_j^{(t-1)} - \langle q_{t-1}, \psi_j^{(t-1)}\rangle q_{t-1}}{\|\psi_j^{(t-1)} - \langle q_{t-1},\psi_j^{(t-1)}\rangle q_{t-1}\|_2}$ Only local correction is used. No full retraction is needed to maintain orthogonality among selected and remaining atoms.

PLR Decomposition—Unconstrained Parametrization:

In "Unconstrained representation of orthogonal matrices with application to common principle components" (Bagnato et al., 2019), the orthogonal matrix $Q$ is parametrized by $d(d-1)/2$ unconstrained entries via

$Q = P L R^{-1}$

where $L$ is unit lower-triangular and $R$ is the $R$ factor from the QR decomposition of $PL$ . Classical unconstrained optimizers operate directly on $\ell_{ij}$ parameters.

Landing and Penalty Gradient Schemes:

Recent decentralized and distributed optimization approaches replace retraction steps with gradient-penalty descent on the Stiefel manifold (Sun et al., 2024, Song et al., 3 Jun 2025): $X^{k+1} = X^{k} - \gamma \left[\operatorname{grad} f(X^k) + \lambda X^k(X^{k\top} X^k - I)\right]$ Orthonormality is preserved or recovered asymptotically as the penalty parameter $\lambda$ (or $\rho$ in exact penalty methods) is increased.

Component-wise Splitting for Gradient Flow:

Electronic structure methods such as (Wang et al., 2023) perform Gauss-Seidel-like updates for each wavefunction $\psi_k$ , with algebraic structure ensuring mutual orthogonality and normalization.

Givens Rotation Coordinate Descent:

In (Shalit et al., 2013), optimization over the orthogonal group is performed via individual Givens-rotation steps, each of which strictly preserves orthogonality: $U_{t+1} = U_t G(i,j,\theta_{t+1})$ No auxiliary projection or retraction is required.

3. Theoretical Guarantees and Convergence Properties

Retraction-free schemes have rigorous convergence guarantees in various contexts:

Feasibility and constraint satisfaction: Solutions generated by penalty or exact penalty reformulations converge so that $X_k^\top X_k \to I$ or $\|\psi_i, \psi_j\| \to \delta_{ij}$ as $k\to\infty$ (Sun et al., 2024, Aybat et al., 24 Oct 2025).
Global convergence rates: For decentralized nonconvex optimization on the Stiefel manifold, DRFGT attains the optimal $\mathcal{O}(1/K)$ ergodic rate, and linear convergence under local Riemannian PL conditions (Sun et al., 2024).
Stationarity guarantees: In sm-MGDA (Aybat et al., 24 Oct 2025), any limit point of the penalized iterates is a true stationary point for the original manifold problem, with the penalty ensuring vanishing constraint violation in the limit.
Unconditional energy stability: The component-wise splitting for Kohn–Sham equations preserves orthonormality exactly and is energy stable for arbitrarily large time-steps (Wang et al., 2023).
Gradient-stability and error bounds in PDEs: The fully discrete orthonormality-preserving scheme for spatial-temporal saddle search (HiSD) yields uniform bounds and first-order-in-time, second-order-in-space error estimates. Morse index preservation is achieved (Zhang et al., 13 Jan 2026).
Empirical convergence: Givens-rotation coordinate descent displays sublinear convergence rate to critical points and competitive empirical performance in sparse PCA and tensor decomposition (Shalit et al., 2013).

4. Numerical Stability and Implementation Considerations

Distinct advantages arise in terms of stability and implementation:

Many schemes avoid floating-point drift found in repeated Gram-Schmidt or QR orthonormalization by intrinsically encoding projections in the update rule (Wang et al., 2023, Zhang et al., 13 Jan 2026).
Penalty-gradient methods maintain iterates in a “safe region” (e.g., $\| X^\top X - I\| < \epsilon$ ), with uniform step-size constraints ensuring no loss of feasibility (Song et al., 3 Jun 2025). The penalty gradient and canonical Riemannian gradient are mutually orthogonal.
The use of unconstrained parameterizations (PLR, Givens) enables compatibility with unconstrained optimizers, facilitating easy implementation in automatic differentiation frameworks (Bagnato et al., 2019, Shalit et al., 2013).
GPU efficiency: Exact penalty and landing methods are BLAS- and autodiff-friendly, avoiding QR/SVD steps unsuitable for hardware acceleration (Aybat et al., 24 Oct 2025).
In the presence of weak orthogonality drift, occasional re-orthogonalization can be implemented as a backup without interfering with convergence guarantees (Xue et al., 2015).

5. Computational Complexity and Communication

Retraction-free schemes typically offer computational and communication advantages:

Algorithm overhead: eOMP incurs a modest $4 \times$ to $5\times$ increase in per-iteration complexity versus vanilla OMP, but retains $\mathcal{O}(NMs)$ scaling, supporting large dictionaries (Xue et al., 2015).
Penalty-based decentralized algorithms: DRFGT and EF-Landing have 2–3 $\times$ lower per-iteration arithmetic complexity than retraction-based schemes, without requiring matrix factorizations per step (Sun et al., 2024, Song et al., 3 Jun 2025).
Communication-efficient design: EF-Landing incorporates error-feedback compression and block-wise decomposition, reducing communication by up to $10\times$ with no loss in convergence rate (Song et al., 3 Jun 2025).
Structure-preserving updates: Givens coordinate-descent enables $\mathcal{O}(d)$ -cost per update for orthogonal matrices, supporting large-scale matrix optimization (Shalit et al., 2013).
Variational PDE schemes: Component-wise splitting and space-time saddle search admit updates via linear solves per variable, with decoupled structure allowing parallelization and rapid convergence (Wang et al., 2023, Zhang et al., 13 Jan 2026).

6. Applications and Extensions

Applications span signal processing, statistical machine learning, data analysis, and electronic structure computation.

Sparse representation and compressed sensing: eOMP maximizes residual reduction and achieves sparser decompositions in coding tasks while improving recovery rates on Gaussian ensembles (Xue et al., 2015).
Statistical inference: PLR decomposition enables unconstrained optimization for CPCA and robust estimation of common orthogonal matrices under heavy-tailed distributions (Bagnato et al., 2019).
Networked and distributed learning: DRFGT and EF-Landing facilitate decentralized learning on the Stiefel manifold at scale, with practical efficacy demonstrated for block-wise PCA and deep-network layer orthogonality (Sun et al., 2024, Song et al., 3 Jun 2025).
Nonconvex–concave and minimax problems: Retraction-free exact penalty methods extend to nonsmooth and dual adaptive scenarios, preserving feasibility asymptotically and supporting efficient primal-dual updates (Aybat et al., 24 Oct 2025).
Tensor and latent variable models: Givens rotation frameworks yield scalable methods for sparse PCA and high-dimensional tensor decomposition, with provably optimal stationary points (Shalit et al., 2013).
Kohn–Sham DFT and nonlinear PDEs: Component-wise splitting and space-time saddle dynamics schemes ensure exact preservation of physical constraints (e.g., orbital orthonormality, Morse index) while supporting fast iterative solvers for electronic structure and multiple-solution search in semilinear elliptic problems (Wang et al., 2023, Zhang et al., 13 Jan 2026).

A plausible implication is that as optimization scales and hardware architectures trend toward massive parallelization, retraction-free methods will increasingly supplant classical manifold algorithms for structured matrix constraints, particularly in domains where computational and communication load are limiting.