Papers
Topics
Authors
Recent
Search
2000 character limit reached

JORC-UMAP: Enhanced Manifold Learning

Updated 27 January 2026
  • JORC-UMAP methods are advanced extensions of vanilla UMAP, integrating geometric, topological, and spectral priors to address issues like topological tearing and structural collapse.
  • The methodologies include Jaccard Ollivier-Ricci curvature adjustments, joint characterization for hyperspectral analysis, and spectral coarsening for scalable embedding, each tailored to enhance manifold fidelity.
  • These techniques enable improved cluster separation, robust feature extraction, and accelerated processing, offering practical advantages in high-dimensional data visualization and analysis.

JORC-UMAP refers to several distinct, advanced methodologies that integrate UMAP (Uniform Manifold Approximation and Projection) with additional geometric, topological, or spectral information, each tailored to overcome specific limitations of vanilla UMAP. Prominent JORC-UMAP approaches include (1) Jaccard Ollivier-Ricci Curvature UMAP, which injects geometric curvature and neighborhood overlap as graph priors (Li et al., 23 Jan 2026); (2) Joint Characterization plus UMAP for hyperspectral image manifold analysis (Sousa et al., 2023); and (3) Jointly Organized Representation Compression for UMAP via spectral coarsening for scalable embedding (Wang, 2024). All are consistent in modifying or extending UMAP’s graph construction or objective to capture deeper manifold structure, suppress spurious links, or admit information-theoretic and computational advantages.

1. Motivation and UMAP Limitations

UMAP is a state-of-the-art nonlinear dimensionality reduction (DR) technique widely utilized for visualization and feature extraction of high-dimensional data. Despite its effectiveness, two key limitations have been repeatedly identified:

  • Topological tearing: Disconnectedness in the low-dimensional embedding, where thin bridges in the data manifold—such as those in the Swiss roll—are mistakenly separated.
  • Structural collapse: Over-contraction or merging of intrinsically curved manifold branches, leading to loss of structural differentiation.

These failures arise from UMAP’s reliance on locally Euclidean, k-nearest neighbor (k-NN) graphs, which may ignore manifold curvature and admit spurious neighborhood connections. JORC-UMAP variants address these limitations by enriching the UMAP graph construction with geometric, topological, or spectral priors to better capture true manifold structure (Li et al., 23 Jan 2026, Sousa et al., 2023, Wang, 2024).

2. Jaccard Ollivier-Ricci Curvature UMAP (JORC-UMAP)

2.1 Geometric Prior: Ollivier–Ricci Curvature

JORC-UMAP introduces edge-wise Ollivier–Ricci curvature (ORC), capturing local geometric bottlenecks in the manifold graph G=(V,E)G=(V,E). For each edge (i,j)(i,j):

  • Construct lazy random-walk measures mim_i, mjm_j using “laziness” parameter aa and original UMAP edge weights WijW_{ij}.
  • Compute the 1-Wasserstein distance W1(mi,mj)W_1(m_i, m_j) (approximated via entropic Sinkhorn-Knopp iteration for efficiency).
  • Define edge curvature:

κ(i,j)=1W1(mi,mj)d(i,j)\kappa(i, j) = 1 - \frac{W_1(m_i, m_j)}{d(i, j)}

where d(i,j)d(i, j) is the shortest-path graph distance.

Edges with negative curvature (κ(i,j)<0\kappa(i, j)<0) are “bottlenecks” and strengthened:

wij=wij+(1wij)tanh(Sκ(i,j))w_{ij}' = w_{ij} + (1 - w_{ij}) \cdot \tanh(S \cdot |\kappa(i, j)|)

with SS a strength hyperparameter.

2.2 Topological Prior: Jaccard Similarity

To reduce noise artifacts:

  • Compute Jaccard(i,j)=N(i)N(j)N(i)N(j)\mathrm{Jaccard}(i, j) = \frac{|N(i)\cap N(j)|}{|N(i)\cup N(j)|}
  • Threshold θ\theta (default $0.1$) distinguishes true skeleton from shortcut edges:
    • If κ<0\kappa<0 and Jaccard θ\geq \theta, preserve/boost.
    • If κ<0\kappa<0 and Jaccard <θ<\theta, suppress: wijwij(0)×εw_{ij} \leftarrow w^{(0)}_{ij} \times \varepsilon (ε=105\varepsilon=10^{-5}).

2.3 Combined Edge Update and Embedding

The update multiplicatively combines these modifiers. After edge reweighting:

  • Symmetrize weights for fuzzy simplicial set construction.
  • Run standard UMAP stochastic gradient descent minimizing cross-entropy between high and low-dimensional graphs.

2.4 Hyperparameters

Key values (recommended):

Parameter Default/Recommended Range
kk (neighbors) 10–20
aa (laziness) 0.0–0.2
SS (curvature strength) 1.5–3.0
θ\theta (Jaccard) 0.05–0.15
BB (curvature suppression) ≈0.9
ε\varepsilon (noise) 10510^{-5}
Sinkhorn regularization 1×1021\times 10^{-2}; max iters = 50

2.5 Empirical Performance

  • Synthetic manifolds: JORC-UMAP eliminates tearing/collapse (Swiss roll, trefoil knot).
  • MNIST/Fashion-MNIST: SVM accuracy 0.96\approx 0.96 (vs UMAP $0.967$), random-triplet 0.620\approx 0.620 (vs $0.615$), centroid-triplet 0.724\approx 0.724 (vs $0.716$).
  • Computational cost: 1.2×\approx 1.2\times vanilla UMAP; embedding phase identical.

JORC-UMAP is also fully compatible as a drop-in replacement within graph-based DR frameworks such as TriMap, PaCMAP (Li et al., 23 Jan 2026).

3. Joint Characterization–UMAP (JC–UMAP) in Hyperspectral Analysis

In hyperspectral Earth observation, JORC-UMAP denotes “Joint Characterization plus UMAP” (Sousa et al., 2023). The approach combines:

  • Global structure: three-endmember linear spectral mixture model (SVD: Substrate, Vegetation, Dark), providing fractions fSf_S, fVf_V, fDf_D for each spectrum.
  • Local detail: UMAP applied to (i) raw reflectance and (ii) mixture residuals (MR), extracting nonlinear spectral variability.

Clusters within the joint space (fi,uj)(f_i, u_j) (global fraction, UMAP coordinate) identify distinct material subtypes, confirmed via silhouette scores ($0.65$–$0.75$) and spatial consistency.

Results

  • SVD captures 99% of spectral variance in 3 PCs; 99% of pixels RMSE<3.7%\mathrm{RMSE}<3.7\%.
  • EMIT (hyperspectral) UMAP/MR-UMAP manifolds exhibit fine, separated clusters not visible in multispectral equivalents.
  • SVD fraction correlation r>0.95r>0.95 across sensors.
  • JORC-UMAP robustly separates subtle mineralogical/biological classes for large-scale mapping.

4. Spectral Coarsening for Scalable UMAP (JORC-UMAP)

A third context uses JORC-UMAP as “Jointly Organized Representation Compression,” referring to spectral graph coarsening to accelerate UMAP (Wang, 2024):

4.1 Workflow

  • Build k-NN graph, compute Laplacian LGL_G.
  • Compute bottom KK eigenvectors (URN×KU\in\mathbb{R}^{N\times K}); form spectral similarity

suv=xu,xv2xu2xv2s_{uv} = \frac{|\langle \mathbf{x}_u, \mathbf{x}_v \rangle|^2}{\|\mathbf{x}_u\|^2 \|\mathbf{x}_v\|^2}

  • Cluster points with high mutual suvs_{uv} into PP subsets (PNP\ll N), construct pseudo-samples (ziz_i).
  • Run UMAP on pseudo-samples, then upsample embedding to all original points.

This step reduces computational burden from O(NlogN)O(N\log N) to O(ElogV)+O(PlogP)O(|E|\log|V|)+O(P\log P).

4.2 Empirical Results

  • USPS dataset (N9300N\approx 9300): compression to P=N/5P=N/5 achieves 5×\sim 5\times speed-up with negligible loss in embedding quality (visual cluster separation identical).
  • Spectral dimension K1050K\approx 10-50 appropriate for digit data.

A plausible implication is that this spectral coarsening generalizes to other graph-based embedding or clustering algorithms (Wang, 2024).

5. Comparative Table of JORC-UMAP Approaches

JORC-UMAP Variant Graph Modifier Application Domain Principal Benefit Reference
Jaccard Ollivier-Ricci Curvature Curvature, Jaccard index General DR, manifold learning Faithful topology, less tearing/collapse (Li et al., 23 Jan 2026)
Joint Characterization (JC) + UMAP SVD fractions, MR features Hyperspectral imaging, remote sensing Multiscale clusters, physical interpretability (Sousa et al., 2023)
Spectral Coarsening Graph Compression Laplacian eigenmodes Large-scale embedding Computational speed-up, fidelity preservation (Wang, 2024)

6. Limitations and Prospects

JORC-UMAP methods offer significant gains in structural preservation and/or computational tractability, but have explicit limitations:

  • Measurement noise can distort ORC and Jaccard estimates; threshold calibration is nontrivial.
  • Under coarsening, rare or fine-scale manifold features risk being subsumed.
  • SVD-based JC–UMAP cannot model heavily non-continuum surfaces (e.g., urban, evaporites) without local or high-order endmember models.
  • Interpretation of rich UMAP-JC clusters in hyperspectral contexts typically requires substantial domain expertise.

Extension opportunities include adaptive spectral coarsening, recursive multi-level compression for massive data, and integration with segmentation/classification pipelines (Sousa et al., 2023, Wang, 2024).

7. Implementation and Broader Impact

JORC-UMAP algorithms are designed as modular, drop-in components for graph-based DR. The geometrically and topologically enriched variants robustly restore manifold connectivity, while spectral coarsening ensures scalability to large NN. As high-SNR hyperspectral sensors become more widely deployed, the JC–UMAP framework in particular provides a systematic exploratory workflow for uncovering multiscale physical structures in Earth and planetary data (Sousa et al., 2023). The generalization of spectral graph coarsening suggests broader implications for scalable manifold learning beyond UMAP (Wang, 2024).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to JORC-UMAP.