- The paper introduces HypeGBMS, a hyperbolic extension of GBMS that leverages Möbius gyrovector operations in the Poincaré ball model.
- It demonstrates rigorous convergence and statistical consistency, outperforming traditional Euclidean clustering methods on hierarchical datasets.
- Empirical results on real-world benchmarks and image segmentation tasks highlight improved ARI/NMI and refined hierarchical mode detection.
Hyperbolic Gaussian Blurring Mean Shift: Mode-Seeking Clustering in Non-Euclidean Geometries
Introduction
The paper "Hyperbolic Gaussian Blurring Mean Shift: A Statistical Mode-Seeking Framework for Clustering in Curved Spaces" (2512.11448) introduces HypeGBMS, a principled extension of the Gaussian Blurring Mean Shift (GBMS) algorithm for clustering in hyperbolic spaces, specifically within the Poincaré ball model. The motivation originates from the limitations of classical Euclidean clustering—especially density-based approaches like mean shift—on data with inherent hierarchical or tree-like structure. Hyperbolic geometry, through exponential volume growth and negative curvature, admits embeddings and metric relations that are highly compatible with hierarchical structures, as manifest for instance in taxonomies or scale-free networks. This work advances a framework for non-parametric, mode-seeking clustering on data manifolds endowed with negative curvature, ensuring intrinsic consistency, convergent dynamics, and computational tractability.
Methodology
Hyperbolic Geometry and Algebraic Structures
The foundation of HypeGBMS is established in the Poincaré ball model Dcp, parameterized by curvature c<0. All mean computations, updates, and distances are cast in terms of operations that are intrinsic to hyperbolic space: M\"obius gyrovector addition, scalar multiplication, and geodesic exponential/logarithm maps. The algorithm replaces all Euclidean computations in GBMS with their hyperbolic analogues, specifically:
- Pairwise Hyperbolic Distances: Used for Gaussian kernel weights, associated with each data point and its neighbors.
- Möbius-weighted Means: Updating each point toward the local mean under hyperbolic geometry, ensuring iterates remain in the Poincaré ball.
- Curvature Control: The curvature parameter c modulates how strongly hierarchical effects manifest.
This construction guarantees that the mean shift flow respects the negative curvature and the non-Euclidean metric, a requirement for statistical mode-finding when the geometry of the underlying data manifold deviates from flatness.
Algorithmic Structure
HypeGBMS consists of three stages: projection of input data onto the hyperbolic ball, iterative mode-seeking updates using Möbius weighted means, and cluster assignment by finding connected components in the induced similarity graph. Each update is guaranteed to remain in the admissible region of the manifold, with step sizes and pairwise influences reweighted by the hyperbolic Gaussian kernel.
The procedure's complexity is O(TN2p), dominated by the repeated computation of all pairwise hyperbolic distances and mean updates. The use of gyrovector algebra ensures closed-form, numerically stable updates.
Theoretical Analysis
Convergence and Consistency
The paper provides a rigorous convergence proof for HypeGBMS, leveraging the geometry of the Poincaré ball:
- Convergence under Compactness: If all iterates lie in a compact, convex region, the mode-seeking flow (a Riemannian gradient ascent on the hyperbolic KDE) monotonically increases the kernel density and converges to stationary points. The inexactness of Möbius means relative to the Riemannian Fréchet mean vanishes as curvature or local cluster diameter go to zero.
- Statistical Consistency: Under classical kernel density convergence conditions (N→∞, kernel bandwidth σN→0, NσNp→∞), the estimated modes in HypeGBMS converge (in probability) to the true density modes in the hyperbolic manifold.
Approximation and Scalability
While quadratic in N in its basic form, HypeGBMS can be made scalable by employing approximation methods from the kernel literature (Nyström, random feature expansions) and fast nearest neighbor search structures generalized to hyperbolic space.
Empirical Evaluation
HypeGBMS is empirically benchmarked on 11 datasets with hierarchical or complex latent structure, including UCI and image datasets. It is consistently shown to outperform baseline methods—classical k-means, DBSCAN, Gaussian mean-shift, and various accelerated/weighted mean-shifts—on ARI and NMI for both small and large scale settings. On essentially hierarchical data (e.g., Phishing URL, Zoo, ORHD), the superiority is most pronounced, with ARI/NMI improvements sometimes exceeding 0.2 over competitors.
The capacity to separate clusters that represent different levels of hierarchy, as opposed to only compact Euclidean masses, is visually corroborated via t-SNE projections of clustering structure.
Figure 1: t-SNE visualizations on Glass, ORHD, and Phishing URL; HypeGBMS captures finer-grained hierarchical modes compared to standard GBMS.
Qualitative Results in Image Segmentation
On BSDS500 and PASCAL VOC 2012, two canonical image segmentation benchmarks, HypeGBMS aligns with and enhances region boundaries beyond the reach of traditional mean-shift or grid-based algorithms.
Figure 2: BSDS500 segmentation: HypeGBMS yields cleaner, more semantically consistent regions compared to Euclidean methods.
Figure 3: PASCAL VOC 2012 sample: HypeGBMS produces clusters better aligned with real compositional boundaries in complex images.
Parameter and Ablation Analysis
The paper provides a thorough investigation of bandwidth (Gaussian kernel σ) and curvature (c) effects on clustering metrics (ARI/NMI). Cluster fidelity is highly sensitive to both, with optimal hyperbolic operation typically achieved for c in [−0.6,−1.0] and moderate kernel bandwidth.
Figure 4: ARI/NMI sweeps over bandwidth for various curvatures; strong performance requires proper tuning, especially with negative curvature.
Figure 5: ARI/NMI as a function of curvature; high negative curvature (smaller c) yields the best hierarchical separation.
Implications and Future Directions
The formalism and empirical evidence in HypeGBMS substantially reinforce the requirements for non-Euclidean clustering frameworks when the data exhibits natural hierarchy, partial order, or exponential expansion. The approach extends mean-shift's versatility from flat to curved geometries, making it broadly relevant to:
- Network and taxonomy analysis where latent tree-like relations must be uncovered from feature data.
- Bioinformatics, computational biology, and natural language processing, where ontological depth is intrinsic.
- Representation learning in geometric deep learning, particularly for embedding spaces beyond the Euclidean.
For scalable or high-dimensional regimes, further integration with approximate kernels and local affinity-preserving search structures—suitably adapted for negative curvature—remains a practical necessity. Additionally, data-driven or meta-learned selection of curvature parameters, possibly as part of an end-to-end geometric deep learning pipeline, represents an immediate extension.
Conclusion
The proposed HypeGBMS algorithm establishes a rigorous, convergent, and empirically validated non-parametric clustering framework designed for hyperbolic manifolds. By replacing every Euclidean step with its intrinsic hyperbolic generalization, the method captures hierarchical and tree-structured latent organization in data, a setting where flat-space density estimation and clustering fundamentally fail. The synthesis of statistical mode-seeking, gyrovector algebra, and careful geometric analysis renders HypeGBMS a benchmark for non-Euclidean unsupervised learning. The work lays the foundation for further algorithmic advances in geometric representation learning and analysis of complex, structured datasets.