Papers
Topics
Authors
Recent
Search
2000 character limit reached

Hyperbolic Gaussian Blurring Mean Shift: A Statistical Mode-Seeking Framework for Clustering in Curved Spaces

Published 12 Dec 2025 in cs.LG and stat.ML | (2512.11448v1)

Abstract: Clustering is a fundamental unsupervised learning task for uncovering patterns in data. While Gaussian Blurring Mean Shift (GBMS) has proven effective for identifying arbitrarily shaped clusters in Euclidean space, it struggles with datasets exhibiting hierarchical or tree-like structures. In this work, we introduce HypeGBMS, a novel extension of GBMS to hyperbolic space. Our method replaces Euclidean computations with hyperbolic distances and employs Möbius-weighted means to ensure that all updates remain consistent with the geometry of the space. HypeGBMS effectively captures latent hierarchies while retaining the density-seeking behavior of GBMS. We provide theoretical insights into convergence and computational complexity, along with empirical results that demonstrate improved clustering quality in hierarchical datasets. This work bridges classical mean-shift clustering and hyperbolic representation learning, offering a principled approach to density-based clustering in curved spaces. Extensive experimental evaluations on $11$ real-world datasets demonstrate that HypeGBMS significantly outperforms conventional mean-shift clustering methods in non-Euclidean settings, underscoring its robustness and effectiveness.

Summary

  • The paper introduces HypeGBMS, a hyperbolic extension of GBMS that leverages Möbius gyrovector operations in the Poincaré ball model.
  • It demonstrates rigorous convergence and statistical consistency, outperforming traditional Euclidean clustering methods on hierarchical datasets.
  • Empirical results on real-world benchmarks and image segmentation tasks highlight improved ARI/NMI and refined hierarchical mode detection.

Hyperbolic Gaussian Blurring Mean Shift: Mode-Seeking Clustering in Non-Euclidean Geometries

Introduction

The paper "Hyperbolic Gaussian Blurring Mean Shift: A Statistical Mode-Seeking Framework for Clustering in Curved Spaces" (2512.11448) introduces HypeGBMS, a principled extension of the Gaussian Blurring Mean Shift (GBMS) algorithm for clustering in hyperbolic spaces, specifically within the Poincaré ball model. The motivation originates from the limitations of classical Euclidean clustering—especially density-based approaches like mean shift—on data with inherent hierarchical or tree-like structure. Hyperbolic geometry, through exponential volume growth and negative curvature, admits embeddings and metric relations that are highly compatible with hierarchical structures, as manifest for instance in taxonomies or scale-free networks. This work advances a framework for non-parametric, mode-seeking clustering on data manifolds endowed with negative curvature, ensuring intrinsic consistency, convergent dynamics, and computational tractability.

Methodology

Hyperbolic Geometry and Algebraic Structures

The foundation of HypeGBMS is established in the Poincaré ball model Dcp\mathbb{D}_c^p, parameterized by curvature c<0c < 0. All mean computations, updates, and distances are cast in terms of operations that are intrinsic to hyperbolic space: M\"obius gyrovector addition, scalar multiplication, and geodesic exponential/logarithm maps. The algorithm replaces all Euclidean computations in GBMS with their hyperbolic analogues, specifically:

  • Pairwise Hyperbolic Distances: Used for Gaussian kernel weights, associated with each data point and its neighbors.
  • Möbius-weighted Means: Updating each point toward the local mean under hyperbolic geometry, ensuring iterates remain in the Poincaré ball.
  • Curvature Control: The curvature parameter cc modulates how strongly hierarchical effects manifest.

This construction guarantees that the mean shift flow respects the negative curvature and the non-Euclidean metric, a requirement for statistical mode-finding when the geometry of the underlying data manifold deviates from flatness.

Algorithmic Structure

HypeGBMS consists of three stages: projection of input data onto the hyperbolic ball, iterative mode-seeking updates using Möbius weighted means, and cluster assignment by finding connected components in the induced similarity graph. Each update is guaranteed to remain in the admissible region of the manifold, with step sizes and pairwise influences reweighted by the hyperbolic Gaussian kernel.

The procedure's complexity is O(TN2p)\mathcal{O}(T N^2 p), dominated by the repeated computation of all pairwise hyperbolic distances and mean updates. The use of gyrovector algebra ensures closed-form, numerically stable updates.

Theoretical Analysis

Convergence and Consistency

The paper provides a rigorous convergence proof for HypeGBMS, leveraging the geometry of the Poincaré ball:

  • Convergence under Compactness: If all iterates lie in a compact, convex region, the mode-seeking flow (a Riemannian gradient ascent on the hyperbolic KDE) monotonically increases the kernel density and converges to stationary points. The inexactness of Möbius means relative to the Riemannian Fréchet mean vanishes as curvature or local cluster diameter go to zero.
  • Statistical Consistency: Under classical kernel density convergence conditions (NN \rightarrow \infty, kernel bandwidth σN0\sigma_N \rightarrow 0, NσNpN \sigma_N^p \rightarrow \infty), the estimated modes in HypeGBMS converge (in probability) to the true density modes in the hyperbolic manifold.

Approximation and Scalability

While quadratic in NN in its basic form, HypeGBMS can be made scalable by employing approximation methods from the kernel literature (Nyström, random feature expansions) and fast nearest neighbor search structures generalized to hyperbolic space.

Empirical Evaluation

Performance on Real-World Datasets

HypeGBMS is empirically benchmarked on 11 datasets with hierarchical or complex latent structure, including UCI and image datasets. It is consistently shown to outperform baseline methods—classical kk-means, DBSCAN, Gaussian mean-shift, and various accelerated/weighted mean-shifts—on ARI and NMI for both small and large scale settings. On essentially hierarchical data (e.g., Phishing URL, Zoo, ORHD), the superiority is most pronounced, with ARI/NMI improvements sometimes exceeding 0.2 over competitors.

The capacity to separate clusters that represent different levels of hierarchy, as opposed to only compact Euclidean masses, is visually corroborated via t-SNE projections of clustering structure. Figure 1

Figure 1: t-SNE visualizations on Glass, ORHD, and Phishing URL; HypeGBMS captures finer-grained hierarchical modes compared to standard GBMS.

Qualitative Results in Image Segmentation

On BSDS500 and PASCAL VOC 2012, two canonical image segmentation benchmarks, HypeGBMS aligns with and enhances region boundaries beyond the reach of traditional mean-shift or grid-based algorithms. Figure 2

Figure 2: BSDS500 segmentation: HypeGBMS yields cleaner, more semantically consistent regions compared to Euclidean methods.

Figure 3

Figure 3: PASCAL VOC 2012 sample: HypeGBMS produces clusters better aligned with real compositional boundaries in complex images.

Parameter and Ablation Analysis

The paper provides a thorough investigation of bandwidth (Gaussian kernel σ\sigma) and curvature (cc) effects on clustering metrics (ARI/NMI). Cluster fidelity is highly sensitive to both, with optimal hyperbolic operation typically achieved for cc in [0.6,1.0][-0.6, -1.0] and moderate kernel bandwidth. Figure 4

Figure 4: ARI/NMI sweeps over bandwidth for various curvatures; strong performance requires proper tuning, especially with negative curvature.

Figure 5

Figure 5: ARI/NMI as a function of curvature; high negative curvature (smaller cc) yields the best hierarchical separation.

Implications and Future Directions

The formalism and empirical evidence in HypeGBMS substantially reinforce the requirements for non-Euclidean clustering frameworks when the data exhibits natural hierarchy, partial order, or exponential expansion. The approach extends mean-shift's versatility from flat to curved geometries, making it broadly relevant to:

  • Network and taxonomy analysis where latent tree-like relations must be uncovered from feature data.
  • Bioinformatics, computational biology, and natural language processing, where ontological depth is intrinsic.
  • Representation learning in geometric deep learning, particularly for embedding spaces beyond the Euclidean.

For scalable or high-dimensional regimes, further integration with approximate kernels and local affinity-preserving search structures—suitably adapted for negative curvature—remains a practical necessity. Additionally, data-driven or meta-learned selection of curvature parameters, possibly as part of an end-to-end geometric deep learning pipeline, represents an immediate extension.

Conclusion

The proposed HypeGBMS algorithm establishes a rigorous, convergent, and empirically validated non-parametric clustering framework designed for hyperbolic manifolds. By replacing every Euclidean step with its intrinsic hyperbolic generalization, the method captures hierarchical and tree-structured latent organization in data, a setting where flat-space density estimation and clustering fundamentally fail. The synthesis of statistical mode-seeking, gyrovector algebra, and careful geometric analysis renders HypeGBMS a benchmark for non-Euclidean unsupervised learning. The work lays the foundation for further algorithmic advances in geometric representation learning and analysis of complex, structured datasets.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 10 likes about this paper.