JORC-UMAP: Enhanced Manifold Learning
- JORC-UMAP methods are advanced extensions of vanilla UMAP, integrating geometric, topological, and spectral priors to address issues like topological tearing and structural collapse.
- The methodologies include Jaccard Ollivier-Ricci curvature adjustments, joint characterization for hyperspectral analysis, and spectral coarsening for scalable embedding, each tailored to enhance manifold fidelity.
- These techniques enable improved cluster separation, robust feature extraction, and accelerated processing, offering practical advantages in high-dimensional data visualization and analysis.
JORC-UMAP refers to several distinct, advanced methodologies that integrate UMAP (Uniform Manifold Approximation and Projection) with additional geometric, topological, or spectral information, each tailored to overcome specific limitations of vanilla UMAP. Prominent JORC-UMAP approaches include (1) Jaccard Ollivier-Ricci Curvature UMAP, which injects geometric curvature and neighborhood overlap as graph priors (Li et al., 23 Jan 2026); (2) Joint Characterization plus UMAP for hyperspectral image manifold analysis (Sousa et al., 2023); and (3) Jointly Organized Representation Compression for UMAP via spectral coarsening for scalable embedding (Wang, 2024). All are consistent in modifying or extending UMAP’s graph construction or objective to capture deeper manifold structure, suppress spurious links, or admit information-theoretic and computational advantages.
1. Motivation and UMAP Limitations
UMAP is a state-of-the-art nonlinear dimensionality reduction (DR) technique widely utilized for visualization and feature extraction of high-dimensional data. Despite its effectiveness, two key limitations have been repeatedly identified:
- Topological tearing: Disconnectedness in the low-dimensional embedding, where thin bridges in the data manifold—such as those in the Swiss roll—are mistakenly separated.
- Structural collapse: Over-contraction or merging of intrinsically curved manifold branches, leading to loss of structural differentiation.
These failures arise from UMAP’s reliance on locally Euclidean, k-nearest neighbor (k-NN) graphs, which may ignore manifold curvature and admit spurious neighborhood connections. JORC-UMAP variants address these limitations by enriching the UMAP graph construction with geometric, topological, or spectral priors to better capture true manifold structure (Li et al., 23 Jan 2026, Sousa et al., 2023, Wang, 2024).
2. Jaccard Ollivier-Ricci Curvature UMAP (JORC-UMAP)
2.1 Geometric Prior: Ollivier–Ricci Curvature
JORC-UMAP introduces edge-wise Ollivier–Ricci curvature (ORC), capturing local geometric bottlenecks in the manifold graph . For each edge :
- Construct lazy random-walk measures , using “laziness” parameter and original UMAP edge weights .
- Compute the 1-Wasserstein distance (approximated via entropic Sinkhorn-Knopp iteration for efficiency).
- Define edge curvature:
where is the shortest-path graph distance.
Edges with negative curvature () are “bottlenecks” and strengthened:
with a strength hyperparameter.
2.2 Topological Prior: Jaccard Similarity
To reduce noise artifacts:
- Compute
- Threshold (default $0.1$) distinguishes true skeleton from shortcut edges:
- If and Jaccard , preserve/boost.
- If and Jaccard , suppress: ().
2.3 Combined Edge Update and Embedding
The update multiplicatively combines these modifiers. After edge reweighting:
- Symmetrize weights for fuzzy simplicial set construction.
- Run standard UMAP stochastic gradient descent minimizing cross-entropy between high and low-dimensional graphs.
2.4 Hyperparameters
Key values (recommended):
| Parameter | Default/Recommended Range |
|---|---|
| (neighbors) | 10–20 |
| (laziness) | 0.0–0.2 |
| (curvature strength) | 1.5–3.0 |
| (Jaccard) | 0.05–0.15 |
| (curvature suppression) | ≈0.9 |
| (noise) | |
| Sinkhorn regularization | ; max iters = 50 |
2.5 Empirical Performance
- Synthetic manifolds: JORC-UMAP eliminates tearing/collapse (Swiss roll, trefoil knot).
- MNIST/Fashion-MNIST: SVM accuracy (vs UMAP $0.967$), random-triplet (vs $0.615$), centroid-triplet (vs $0.716$).
- Computational cost: vanilla UMAP; embedding phase identical.
JORC-UMAP is also fully compatible as a drop-in replacement within graph-based DR frameworks such as TriMap, PaCMAP (Li et al., 23 Jan 2026).
3. Joint Characterization–UMAP (JC–UMAP) in Hyperspectral Analysis
In hyperspectral Earth observation, JORC-UMAP denotes “Joint Characterization plus UMAP” (Sousa et al., 2023). The approach combines:
- Global structure: three-endmember linear spectral mixture model (SVD: Substrate, Vegetation, Dark), providing fractions , , for each spectrum.
- Local detail: UMAP applied to (i) raw reflectance and (ii) mixture residuals (MR), extracting nonlinear spectral variability.
Clusters within the joint space (global fraction, UMAP coordinate) identify distinct material subtypes, confirmed via silhouette scores ($0.65$–$0.75$) and spatial consistency.
Results
- SVD captures 99% of spectral variance in 3 PCs; 99% of pixels .
- EMIT (hyperspectral) UMAP/MR-UMAP manifolds exhibit fine, separated clusters not visible in multispectral equivalents.
- SVD fraction correlation across sensors.
- JORC-UMAP robustly separates subtle mineralogical/biological classes for large-scale mapping.
4. Spectral Coarsening for Scalable UMAP (JORC-UMAP)
A third context uses JORC-UMAP as “Jointly Organized Representation Compression,” referring to spectral graph coarsening to accelerate UMAP (Wang, 2024):
4.1 Workflow
- Build k-NN graph, compute Laplacian .
- Compute bottom eigenvectors (); form spectral similarity
- Cluster points with high mutual into subsets (), construct pseudo-samples ().
- Run UMAP on pseudo-samples, then upsample embedding to all original points.
This step reduces computational burden from to .
4.2 Empirical Results
- USPS dataset (): compression to achieves speed-up with negligible loss in embedding quality (visual cluster separation identical).
- Spectral dimension appropriate for digit data.
A plausible implication is that this spectral coarsening generalizes to other graph-based embedding or clustering algorithms (Wang, 2024).
5. Comparative Table of JORC-UMAP Approaches
| JORC-UMAP Variant | Graph Modifier | Application Domain | Principal Benefit | Reference |
|---|---|---|---|---|
| Jaccard Ollivier-Ricci Curvature | Curvature, Jaccard index | General DR, manifold learning | Faithful topology, less tearing/collapse | (Li et al., 23 Jan 2026) |
| Joint Characterization (JC) + UMAP | SVD fractions, MR features | Hyperspectral imaging, remote sensing | Multiscale clusters, physical interpretability | (Sousa et al., 2023) |
| Spectral Coarsening Graph Compression | Laplacian eigenmodes | Large-scale embedding | Computational speed-up, fidelity preservation | (Wang, 2024) |
6. Limitations and Prospects
JORC-UMAP methods offer significant gains in structural preservation and/or computational tractability, but have explicit limitations:
- Measurement noise can distort ORC and Jaccard estimates; threshold calibration is nontrivial.
- Under coarsening, rare or fine-scale manifold features risk being subsumed.
- SVD-based JC–UMAP cannot model heavily non-continuum surfaces (e.g., urban, evaporites) without local or high-order endmember models.
- Interpretation of rich UMAP-JC clusters in hyperspectral contexts typically requires substantial domain expertise.
Extension opportunities include adaptive spectral coarsening, recursive multi-level compression for massive data, and integration with segmentation/classification pipelines (Sousa et al., 2023, Wang, 2024).
7. Implementation and Broader Impact
JORC-UMAP algorithms are designed as modular, drop-in components for graph-based DR. The geometrically and topologically enriched variants robustly restore manifold connectivity, while spectral coarsening ensures scalability to large . As high-SNR hyperspectral sensors become more widely deployed, the JC–UMAP framework in particular provides a systematic exploratory workflow for uncovering multiscale physical structures in Earth and planetary data (Sousa et al., 2023). The generalization of spectral graph coarsening suggests broader implications for scalable manifold learning beyond UMAP (Wang, 2024).