Riemannian PCA: Manifold Dimension Reduction

Updated 13 February 2026

Riemannian PCA is a dimension reduction method that extends classical PCA to data on curved manifolds by respecting their intrinsic geometry.
It maps data to the tangent space using logarithm and exponential maps to identify geodesic submanifolds as principal directions.
Variants like PGA, RFPCA, SFPCA, and sparse PCA adapt to diverse manifold structures, enhancing analysis in fields from trajectory to shape analysis.

Riemannian Principal Component Analysis (R-PCA) is a class of dimension reduction methods that extend the classical Principal Component Analysis framework to data residing on Riemannian manifolds rather than Euclidean spaces. These methodologies respect the intrinsic geometry of the data, enabling analysis in contexts where the underlying structure prohibits direct vector space operations, such as on spheres, hyperbolic spaces, shape manifolds, and spaces of positive definite matrices.

1. Mathematical Foundations and Problem Formulation

Let $\mathcal{M}$ denote a smooth $d$ -dimensional Riemannian manifold embedded in $\mathbb{R}^{d_0}$ and equipped with the Riemannian metric $g_p(\cdot,\cdot)$ induced from the ambient space. The geodesic distance on $\mathcal{M}$ is denoted by $d_\mathcal{M}(p,q)$ . A canonical example is the sphere $S^d \subset \mathbb{R}^{d+1}$ , where $d_\mathcal{M}(p, q) = \arccos \langle p, q \rangle$ and curvature is constant and positive.

Given a dataset $\{x_i\}_{i=1}^N \subset \mathcal{M}$ , classical linear subspaces are replaced by geodesic submanifolds or exponential images of linear subspaces in the tangent space. The Fréchet mean $p^{*}$ of the data is defined as

$p^* = \arg\min_{p \in \mathcal{M}} \sum_{i=1}^N d^2_\mathcal{M}(p, x_i)$

which is well-defined and unique within a convex geodesic ball, provided the data lie within the injectivity radius (Ichi et al., 5 Feb 2026).

For functional data $Y_i: \mathcal{T} \to \mathcal{M}$ , where $\mathcal{T}$ is a compact time interval, one defines a time-varying intrinsic Fréchet mean $\mu(t)$ via

$\mu(t) = \arg\min_{p \in \mathcal{M}} \mathbb{E}[d^2_\mathcal{M}(Y_i(t), p)]$

(Dai et al., 2017).

2. Principal Geodesic Analysis and Tangent Space Methods

Principal Geodesic Analysis (PGA)—a foundational form of R-PCA—proceeds by first mapping data to the tangent space at the Fréchet mean via the Riemannian logarithm map: $z_i = \Log_{p^*}(x_i) \in T_{p^*}\mathcal{M}$ A covariance operator is defined as

$\Sigma = \frac{1}{N}\sum_{i=1}^{N} z_i \otimes z_i$

and the directions of maximal variance are given by the top $k$ eigenvectors $u_j$ of $\Sigma$ : $\Sigma u_j = \lambda_j u_j$ The associated $k$ -dimensional geodesic submanifold is $\{\Exp_{p^*}(\sum_{j=1}^k a_j u_j) : a \in \mathbb{R}^k\}$ (Ichi et al., 5 Feb 2026, Rodríguez, 30 May 2025).

Principal component scores are computed as $\langle z_i, u_j \rangle$ . The method reduces to classical PCA when the curvature vanishes, i.e., when $\mathcal{M} = \mathbb{R}^d$ .

3. Algorithmic Procedures and Variants

Several algorithmic frameworks for R-PCA appear in the literature, with specific attention to computational feasibility and geometric faithfulness:

Riemannian Functional Principal Component Analysis (RFPCA): For functional manifold-valued data, RFPCA first aligns trajectories by the time-varying Fréchet mean, applies the logarithm map to the tangent bundle, and performs FPCA in the resulting Hilbert space of square-integrable tangent curves. Eigen-decomposition yields orthonormal eigenfunctions, principal component scores, and truncated reconstructions are mapped back to the manifold with the exponential map (Dai et al., 2017).
Space Form PCA (SFPCA): In constant curvature manifolds (e.g., spheres, hyperbolic spaces), SFPCA defines affine subspaces via the exponential map at a base point and solves an eigenproblem for the sample second-moment matrix, yielding globally optimal, nested subspaces with closed-form computational procedures (Tabaghi et al., 2023).
UMAP-based R-PCA: For discrete datasets approximating a manifold, one can equip the data with a local distance metric (e.g., UMAP) to define Riemannian structure. Logarithmic and exponential maps are realized with weighted differences and additions, and PCA proceeds in the tangent space at the data-driven Fréchet mean (Rodríguez, 30 May 2025).
Principal Sub-manifolds: This approach extends the R-PCA concept to non-geodesic, higher-dimensional modes by identifying submanifolds whose tangent spaces optimally align with local principal directions estimated via localized tangent-space PCA (Yao et al., 2016).
Symmetric Space PCA: In symmetric spaces (e.g., $n$ -spheres, Grassmannians), totally geodesic submanifolds generalize linear subspaces. The optimal submanifold minimizes summed squared projections, and for spheres and Grassmannians, closed-form SVD-based solutions provide nested principal submanifolds (Marsland et al., 2019).
Riemannian Sparse PCA (on Stiefel Manifolds): For dimensionality reduction with a sparsity constraint, the problem is formulated on the Stiefel manifold, and solved using accelerated Riemannian proximal gradient methods that generalize FISTA (Huang et al., 2019).

4. Theoretical Guarantees and Statistical Properties

Several statistical guarantees underlie R-PCA methods:

For RFPCA, under nonnegative curvature, the residual variance between original and reconstructed trajectories is rigorously controlled and bounded by the tangent-space approximation. For any finite number of retained components $K$ , the error converges to zero as $K \to \infty$ , and root- $n$ convergence rates apply to covariance, eigenfunctions, and principal component scores (Dai et al., 2017).
Central limit theorems for the estimated Fréchet mean (and function) hold under smoothness and regularity, with limiting distributions given by Gaussian processes characterized via the Hessian of the squared distance function (Dai et al., 2017, Ichi et al., 5 Feb 2026).
SFPCA ensures global optimality and strict nesting of subspaces via proper cost functions ( $\sin^2(\sqrt{C}r)$ on the sphere, $\sinh^2(\sqrt{|C|}r)$ in hyperbolic space), reducing the subspace search to a single eigendecomposition (Tabaghi et al., 2023).
Model identifiability and uniqueness of principal directions in tangent space frameworks are guaranteed when eigenvalues are ordered and non-degenerate (Rodríguez, 30 May 2025).
Probabilistic generalizations employ stochastic development and principal bundles to build generative models for manifold-valued data without resorting to explicit linearization; curvature prevents global principal subspaces but allows local, infinitesimal principal directions (Sommer, 2018).

5. Applications and Empirical Performance

R-PCA methodologies have been deployed across a variety of application domains:

Trajectory and Compositional Data: RFPCA affords improved trajectory recovery for movement data on spheres (e.g., flight trajectories on $S^2$ ), with higher variance explained and improved classification accuracy over unconstrained FPCA. In compositional datasets (e.g., behavior patterns in fruit flies), RFPCA outperforms traditional FPCA for both trajectory approximation and subsequent predictive tasks (Dai et al., 2017).
Simulations and High-dimensional Data: In synthetic settings (e.g., nested circles in $\mathbb{R}^{10}$ ), UMAP-based R-PCA yields substantially higher explained variance and more faithful embedding of clusters than Euclidean PCA. On real high-dimensional image data (Olivetti faces), R-PCA achieves better class separation (Rodríguez, 30 May 2025).
Symmetric and Space Form Examples: SFPCA demonstrates both theoretical and computational advantages for microbiome compositional data (spherical) and gene-tree phylogenies (hyperbolic), with improved distortion metrics and classification accuracy compared to iterative approaches such as PGA and HoroPCA (Tabaghi et al., 2023).
Shape and Population Analysis: Principal sub-manifold methods have been successfully applied to shape analysis (handwritten digits, leaf outlines), revealing non-geodesic modes of population variability inaccessible to classical principal geodesics (Yao et al., 2016).

6. Comparison with Classical PCA and Methodological Extensions

Classical PCA fails to respect non-Euclidean constraints, leading to misleading variance decompositions and potentially invalid projections when applied to manifold-valued data. R-PCA approaches, by construction, yield lower approximation error, improved classification performance, and interpretable principal directions intrinsic to the data manifold.

A comparative overview:

Method	Geometry Used	Projection Type
Classical PCA	Euclidean ( $\mathbb{R}^d$ )	Linear subspace
PGA/RFPCA	Arbitrary manifold	Geodesic submanifold
SFPCA	Constant curvature (space form)	Affine tangent subspace via eigendecomposition
Principal sub-manifold	Embedded manifold	Smooth, non-geodesic submanifold

Extensions include Riemannian adaptations of sparse PCA (e.g., via optimization on the Stiefel manifold), probabilistic models utilizing stochastic development, and localized PCA variants structured for high curvature or stratified data geometries (Huang et al., 2019, Sommer, 2018, Rodríguez, 30 May 2025).

7. Computational Considerations and Complexity

R-PCA algorithms incur complexity dominated by mean computation (often iterative, unless cost functions admit closed form), assembly and eigendecomposition of covariance operators (in the tangent space), and repeated application of exponential and logarithm maps.

For SFPCA: overall complexity is $O(ND^2 + D^3)$ (Tabaghi et al., 2023).
For UMAP-driven R-PCA: dominant costs are $O(n \log n)$ for k-NN searches and $O(np^2 + p^3)$ for forming and diagonalizing covariance matrices (Rodríguez, 30 May 2025).
For functional data: Hilbert space computations inherit time complexity from grid discretization and integral operators (Dai et al., 2017).

In summary, Riemannian PCA provides a mathematically principled, statistically sound, and computationally tractable methodology for dimension reduction and exploratory analysis on data respecting complex geometric constraints, with diverse algorithmic instantiations adapted to manifold type, data structure, and analytic goals (Dai et al., 2017, Tabaghi et al., 2023, Rodríguez, 30 May 2025, Ichi et al., 5 Feb 2026, Yao et al., 2016, Sommer, 2018, Marsland et al., 2019, Huang et al., 2019).