Papers
Topics
Authors
Recent
Search
2000 character limit reached

Product Riemannian Gradient Descent

Updated 4 February 2026
  • Product Riemannian Gradient Descent (PRGD) is an optimization framework that extends classical gradient descent to smooth manifold settings by leveraging intrinsic geometry.
  • It employs strategic retraction and tangent-space perturbations to escape saddle points and guarantee approximate second-order optimality.
  • Specialized PRGD variants accelerate convergence in applications like TT-format tensor completion, achieving notable reductions in iterations and computational time.

Product Riemannian Gradient Descent (PRGD) refers to a family of first-order optimization algorithms tailored to smooth constraint manifolds, extending gradient descent mechanisms from Euclidean to Riemannian settings. PRGD variants have been developed for both general non-convex smooth optimization on manifolds via perturbative techniques as well as for specific structured problems, notable among them low-rank tensor completion using the tensor-train (TT) format. Two principal forms are: (i) Perturbed Riemannian Gradient Descent for escaping saddle points and ensuring second-order criticality in non-convex manifold optimization (Criscitiello et al., 2019); and (ii) Preconditioned Riemannian Gradient Descent accelerating convergence for TT-format tensor completion (Bian et al., 23 Jan 2025). Both leverage the manifold geometry and structure through retractions, tangent space manipulations, and strategic use of preconditioning and step alternation.

1. Riemannian Optimization Setup and Structural Assumptions

The central aim in Riemannian optimization is the minimization of a smooth function f:MRf : \mathcal{M} \to \mathbb{R} where M\mathcal{M} is a dd-dimensional Riemannian manifold representing problem-specific constraints (e.g., orthonormality, rank, positivity). Optimization respects manifold geometry and uses tools such as the Riemannian gradient (gradf(x)TxM\operatorname{grad} f(x) \in T_x\mathcal{M}) and Riemannian Hessian (Hessf(x)\operatorname{Hess} f(x) acting on the tangent space). For well-posed results, standard regularity conditions are assumed:

  • Existence of a lower bound for ff over M\mathcal{M}.
  • Lipschitz continuity for the pullback-gradient f^x(s)\nabla \hat f_x(s) and pullback-Hessian 2f^x(s)\nabla^2 \hat f_x(s) over balls in the tangent space TxMT_x\mathcal{M}.
  • Retraction mapping denoted Retrx(s):TxMM\operatorname{Retr}_x(s): T_x\mathcal{M} \to \mathcal{M}, which must be smooth, satisfy Retrx(0)=x\operatorname{Retr}_x(0) = x, and possess an appropriate second-order error term.

In TT-format tensor completion, Mrtt\mathcal{M}_r^{\operatorname{tt}} is the smooth embedded manifold of tensors of fixed TT-rank rr, parametrized by cores GkRrk1×dk×rkG_k \in \mathbb{R}^{r_{k-1} \times d_k \times r_k} (Bian et al., 23 Jan 2025).

2. PRGD Algorithmic Structure and Pseudocode

Two representative PRGD variants have been developed:

PRGD alternates between two types of updates:

  • Manifold step: If gradf(xt)>ϵ\|\operatorname{grad} f(x_t)\| > \epsilon, a Riemannian gradient descent step is taken, retracting back to the manifold.
  • Tangent-space perturbed step: If gradf(xt)ϵ\|\operatorname{grad} f(x_t)\| \leq \epsilon, a random perturbation within a ball in the tangent space is injected, followed by several gradient steps in TxtMT_{x_t}\mathcal{M}, then retracts.

Pseudocode for PRGD proceeds as:

1
2
3
4
5
6
7
8
9
10
11
12
t ← 0
while t ≤ T do
    if ‖grad f(x_t)‖ > ε then
        x_{t+1} ← TangentialGDstep(x_t, 0, η, b, 1)
        t ← t+1
    else
        sample ξ ∼ Uniform(B_{x_t}(0, r)) in T_{x_t}𝓜
        s₀ ← η ξ
        x_{t+𝒯} ← TangentialGDstep(x_t, s₀, η, b, 𝒯)
        t ← t+𝒯
    end if
end while
The TangentialGDstep subroutine performs projected gradient descent steps in the tangent space with appropriate step size and retraction.

PRGD here adapts the inner product (metric) dynamically via data-dependent weighting and projects descent direction appropriately:

  • At each iteration, ambient gradient G=PΩ(TT)G_\ell = P_\Omega(T_\ell - T^*) (sampling operator) is computed.
  • Mode-wise weights are constructed to approximate the local geometry.
  • A tangent-space direction is determined via projection in the weighted metric, and the step is followed by retraction back to Mrtt\mathcal{M}_{r}^{\operatorname{tt}} via TT-SVD.
  • The update reads:

T+1=RetrT(TηW1gradf(T))T_{\ell+1} = \operatorname{Retr}_{T_\ell}\left(T_\ell - \eta_\ell W_\ell^{-1} \operatorname{grad} f(T_\ell)\right)

3. Retraction and Manifold Operators

Retraction, Retrx(s)\operatorname{Retr}_x(s), maps a vector in the tangent space at xx back to the manifold, satisfying Retrx(0)=x\operatorname{Retr}_x(0) = x and differential DRetrx(0)D\operatorname{Retr}_x(0) equals the identity. In proof arguments, the pullback function f^x(s)=f(Retrx(s))\hat f_x(s) = f(\operatorname{Retr}_x(s)) is used for local analysis.

In TT-format tensor completion, retraction is realized via quasi-optimal TT-SVD, which computes the closest tensor with target TT-rank (Bian et al., 23 Jan 2025).

Projection from the ambient space to the tangent space is central. For TT-manifolds, orthogonal projectors PTP_T and its weighted version P~T\widetilde P_{T_\ell} are used, enabling efficient calculation of Riemannian gradients.

4. Convergence Properties and Complexity

Under assumptions A1–A3, for any twice differentiable ff, and retraction Retr\operatorname{Retr} meeting mild second-order error constraints, PRGD guarantees approximate second-order criticality: visits to points xx with gradf(x)ϵ\|\operatorname{grad} f(x)\| \leq \epsilon and λmin(Hessf(x))ρϵ\lambda_{\min}(\operatorname{Hess} f(x)) \geq -\sqrt{\rho \epsilon} within

O((logd)4/ϵ2)O\left((\log d)^4 / \epsilon^2\right)

gradient queries, matching the Euclidean PGD complexity and retaining low dimensionality dependence.

The key proof steps show:

  • RGD steps decrease ff deterministically while far from saddle points.
  • Perturbed tangent-space steps probabilistically escape saddle regions, leveraging a volume argument and randomization to ensure progress with high probability.
  • Mild retraction assumptions suffice; explicit curvature bounds of M\mathcal{M} are not required.

PRGD achieves linear convergence rates (geometric decay) independent of the condition number under tensor incoherence and spikiness assumptions, when started from a sufficiently good initialization and with adequate sample complexity. The contraction factor (e.g. $0.3574$ per iteration) is explicit. Per-step computational complexity aligns with unpreconditioned RGD: gradient formation in O(Ω)O(|\Omega|), preconditioning overhead O(mΩ)O(m|\Omega|), and TT-SVD retraction as the dominant cost O(mdmaxrmax3)O(m d_{\max} r_{\max}^3).

5. Alternating Step Mechanics and Tangent-Space Perturbations

The PRGD design separates manifold navigation from tangent-space exploration:

  • Manifold steps (Retrx\operatorname{Retr}_x): Used when sufficiently far from critical points, guarantee deterministic decrease in ff.
  • Tangent-space perturbed steps: Exploit linear structure in TxMT_x\mathcal{M} near potential saddle points, enable isotropic perturbations and direct adaptation of Euclidean PGD analysis.
  • Switching between these modes ensures transfer of Euclidean complexity guarantees to the manifold setting, obviating the need for costly Hessian computations.

6. Hyperparameter Selection and Example Applications

All PRGD parameters (step size η\eta, perturbation radius rr, inner-loop length T\mathcal{T}) are functions of problem-specific constants: Lipschitz constants from the pullback-gradient and Hessian (\ell, ρ\rho), user tolerance ϵ\epsilon, probability parameter δ\delta, and radius bb. These may require empirical tuning or estimation via adaptive procedures.

Example: PCA as Manifold Optimization (Criscitiello et al., 2019) On the sphere M=Sd1\mathcal{M} = S^{d-1} with cost f(x)=12xAxf(x) = -\frac{1}{2} x^\top A x, the standard projection retraction yields L=2.5AL = 2.5 \|A\|, ρ=9A\rho = 9 \|A\|, so \ell can be taken on the order of A\|A\|. PRGD thus admits concrete instantiation and complexity scaling in high-dimensional data analysis.

Example: TT-format Tensor Completion (Bian et al., 23 Jan 2025) Applications include:

  • Synthetic tensor completion: PRGD reduces iteration and CPU time by up to two orders of magnitude compared to RGD.
  • Hyperspectral image completion: Achieves improved PSNR and up to tenfold convergence acceleration.
  • Quantum state tomography: Demonstrates 5–10× reduction in iterations and substantial CPU-time savings, dependent on step-size schedule.

7. Comparison with Standard RGD and Euclidean PGD

The following table summarizes complexity and requirements:

Algorithm First-order queries Second-order queries Dimensionality dependence Retraction assumptions
Riemannian GD O(1/ϵ2)O(1/\epsilon^2) – (no escape guarantee) Any retraction (A2)
Euclidean PGD O((logd)4/ϵ2)O((\log d)^4/\epsilon^2) O((logd)4/ϵ2)O((\log d)^4/\epsilon^2) (logd)4(\log d)^4 Not applicable
PRGD O((logd)4/ϵ2)O((\log d)^4/\epsilon^2) O((logd)4/ϵ2)O((\log d)^4/\epsilon^2) (logd)4(\log d)^4 Retraction satisfying A2–A4

All algorithms operate with first-order information only. PRGD matches the best-known complexity for saddle point escape in unconstrained Euclidean settings while handling manifold constraints with mild additional conditions on the retraction (and no explicit manifold curvature bounds) (Criscitiello et al., 2019, Bian et al., 23 Jan 2025).

References

  • C. Criscitiello & N. Boumal, "Efficiently escaping saddle points on manifolds" (Criscitiello et al., 2019)
  • S. Jin et al., "How to Escape Saddle Points Efficiently," ICML 2017 & 2019.
  • P. Absil, R. Mahony, R. Sepulchre, "Optimization Algorithms on Matrix Manifolds," Princeton, 2008.
  • Z. Zhang et al., "Fast and Provable Tensor-Train Format Tensor Completion via Preconditioned Riemannian Gradient Descent" (Bian et al., 23 Jan 2025)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Product Riemannian Gradient Descent (PRGD) Algorithm.