Papers
Topics
Authors
Recent
Search
2000 character limit reached

Gravitational Federated Clustering (GFC)

Updated 7 December 2025
  • Gravitational Federated Clustering (GFC) is a framework that identifies cluster centers by detecting persistent singularities in a synthesized gravitational potential field from locally privatized centroids.
  • It integrates client-side ε-LDP with compactness-aware mass encoding to mitigate noise and heterogeneity issues typical in traditional distance-based federated clustering.
  • Empirical evaluations demonstrate significant improvements in clustering accuracy—up to 800% in ARI and NMI—highlighting GFC’s scalability and robustness under strict privacy constraints.

Gravitational Federated Clustering (GFC) is a framework for privacy-preserving cluster analysis in heterogeneous federated settings characterized by non-IID data and local differential privacy (LDP) constraints. GFC reformulates the global clustering problem as the identification of topologically persistent singularities in a synthetic gravitational potential field constructed from locally privatized, compactness-weighted centroids. This approach addresses fundamental limitations of conventional distance-based federated clustering, most notably their susceptibility to LDP-induced noise and sensitivity to data heterogeneity, and enables one-shot, non-iterative aggregation under stringent privacy constraints (Long et al., 30 Nov 2025).

1. Origin and Physical Motivation

GFC is rooted in the notion of "gravitational clustering" (GC), which models each data point as a mass exerting an attractive force in feature space, leading to emergent clusters as zones of high mass concentration (Binder et al., 2017). In distributed settings, GC has been realized by injecting mobile mass units that are attracted to fixed data points and undergo merge operations upon coalescent proximity, enabling adaptive clustering and cluster enumeration without explicit knowledge of cluster count. GC has shown empirical robustness to outliers, convergence in O(10 ⁣ ⁣100)O(10\!-\!100) steps per cluster, and near-linear scalability (Binder et al., 2017).

GFC extends this paradigm to federated learning with the introduction of privacy-preserving mechanisms and a topological aggregation phase, leveraging synthetic gravitational potential fields rather than dynamical particle simulations (Long et al., 30 Nov 2025).

2. Client-Side Compactness-Aware ε-LDP Mechanism

Each federated client mm with private dataset Dm={xij}RdD_m = \{x_{ij}\} \subset \mathbb{R}^d applies per-record ϵ\epsilon-LDP via the Laplace mechanism. Record-level noise ηijLap(0,Δ/ϵ)\eta_{ij}\sim \mathrm{Lap}(0,\Delta/\epsilon), where Δmaxx1\Delta \geq \max \|x\|_1, is added to each coordinate: xˉij=xij+ηij\bar{x}_{ij} = x_{ij} + \eta_{ij}. On this privatized dataset Dˉm\bar{D}_m, the client executes kk-means clustering (with kk possibly predetermined or adaptively chosen) to yield kk local clusters Cm,iC_{m,i} with centroids cm,iRdc_{m,i}\in\mathbb{R}^d.

To encode local cluster compactness, each centroid cm,ic_{m,i} is assigned a mass

wm,i=exp(12σm2xCm,ixˉcm,i2),w_{m,i} = \exp\left(-\frac{1}{2\sigma_m^2}\sum_{x\in C_{m,i}}\|\bar{x} - c_{m,i}\|^2\right),

where σm2\sigma_m^2 is the empirical variance of pairwise distances in DmD_m. The mass wm,iw_{m,i} thus decays with intra-cluster variance, giving higher weight to more tightly clustered structures. This pair (cm,i,wm,i)(c_{m,i}, w_{m,i}) is transmitted to the server. Owing to the composition property of the Laplace mechanism, the set of kk centroids per client as a whole satisfies ϵ\epsilon-LDP at the client level (Long et al., 30 Nov 2025).

3. Construction of the Gravitational Potential Field

Upon collection of privatized, mass-weighted centroids S={(cα,wα)}S = \{(c_\alpha, w_\alpha)\}, the server synthesizes a continuous gravitational potential field Φ:RdR\Phi:\mathbb{R}^d\rightarrow\mathbb{R} over a probe set GG of uniformly sampled points within the bounding box of SS. The field at probe gg is defined as

Φ(g)=E(g)=(cα,wα)Swαgcαp+δ,\Phi(g) = E(g) = \sum_{(c_\alpha, w_\alpha)\in S} \frac{w_\alpha}{\|g - c_\alpha\|^p + \delta},

with p=2p = 2 (Coulomb-like), and δ>0\delta>0 as a regularization constant to avoid singularities. Dense, heavily weighted clusters yield prominent peaks in Φ\Phi, representing candidate global cluster centers. This approach eliminates direct reliance on noisy pairwise distances, instead aggregating global structure via the potential field formalism (Long et al., 30 Nov 2025).

4. Topological Aggregation via Persistent Homology

The extraction of robust cluster centroids from the noisy potential field leverages persistent homology on superlevel sets. For descending thresholds hh from maxΦ\max\Phi to minΦ\min\Phi, the superlevel set Fh={gG:Φ(g)h}F_h = \{g\in G:\Phi(g)\geq h\} is constructed. The $0$th homology π0(Fh)\pi_0(F_h), representing the connected components via an rr-neighborhood graph on GG, is computed for each hh.

Persistent homology tracks the birth and death of these $0$-dimensional features. True clusters correspond to components with large persistence intervals (significant death–birth gaps). For each persistent component (leaf LL in the merge tree TT), an energy-weighted centroid is calculated: μL=gLΦ(g)ggLΦ(g).\mu_L = \frac{\sum_{g\in L} \Phi(g)\cdot g}{\sum_{g\in L} \Phi(g)}. The global centroids are then given by the ncn_c leaves with the highest persistence values. This topological filtering enables the identification of stable, noise-resistant cluster centers, overcoming the adverse effects of privatization noise and non-IID heterogeneity (Long et al., 30 Nov 2025).

5. Theoretical Properties: Privacy–Error Bounds and Noise Attenuation

GFC achieves a provable trade-off between privacy and accuracy. The error in the estimated global centroid c^\hat{c} relative to the true centroid cc^* can be decomposed into positional noise O(1/ϵ)O(1/\epsilon), arising from the Laplace mechanism, and mass perturbation noise O(1/ϵ2)O(1/\epsilon^2). The topological aggregation stage filters out most mass-based fluctuations, so that

E[c^c]C1ϵ,\mathbb{E}[\|\hat{c} - c^*\|] \leq C \cdot \frac{1}{\epsilon},

where C=O(1)C=O(1) depends on data dimensionality and cluster size. This closed-form bound is derived by tracking how local centroid shifts aggregate through the persistent homology filtration (Long et al., 30 Nov 2025).

The gravitational potential field Φ\Phi satisfies a global Lipschitz condition, with constant

L=pmaxαwαδ2αwα,L = p\cdot\max_\alpha w_\alpha\cdot\delta^{-2}\cdot\sum_\alpha w_\alpha,

ensuring that for y,zRdy,z\in\mathbb{R}^d, Φ(y)Φ(z)Lyz|\Phi(y)-\Phi(z)|\leq L\|y-z\|. In high-density regions, local noise in cαc_\alpha or wαw_\alpha is exponentially attenuated: a perturbation η\eta shifts Φ\Phi by at most O(ηeκN)O(\eta e^{-\kappa N}), where NN is the number of points in the cluster. This smoothing property fundamentally differentiates GFC from distance-based methods under strong LDP (Long et al., 30 Nov 2025).

6. Empirical Evaluation and Comparative Performance

GFC has been benchmarked against K-Fed, MUFC, NN-FC (one-shot baselines), and iterative DP-Lloyds methods on ten datasets with varying degrees of client count and privacy budgets. Under strong privacy (ϵ<1\epsilon<1), GFC yields Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI) improvements of $30$–800%800\% over baselines. For example, on MNIST at ϵ=0.01\epsilon=0.01, GFC achieves ARI0.53\mathrm{ARI}\approx0.53, while competing methods yield near-zero ARI.

Ablation studies confirm the necessity of the mass encoding: omitting (1) drops ARI by 40%40\%. Robustness of the topological phase is verified by varying δ\delta and the probe point multiplier α\alpha; stable centroids consistently emerge across wide ranges. Scalability to $1000$ clients demonstrates that GFC is uniquely able to preserve coherent clusters under large-scale, high-privacy regimes (Long et al., 30 Nov 2025).

The gravitational paradigm in clustering—treating data as mass in feature space—originates with Binder et al.'s Gravitational Clustering (GC), which emphasizes adaptive clustering, cluster enumeration, and decentralized diffusion–adaptation schemes in wireless sensor networks. GC features mobile mass units subject to Newtonian or generalized attractive forces, periodic fusion, and robust local leadership via mass thresholding (Binder et al., 2017). GFC's topological extension and privacy-preserving formulation generalize these ideas for federated, LDP-privatized, non-IID data, replacing mobile-unit dynamics with potential fields and persistent homology on synthetic probe sets (Long et al., 30 Nov 2025).

A plausible implication is that GFC's structural separation of local (privacy-constrained) summarization and global (physics-inspired, topology-driven) aggregation may serve as a blueprint for future privacy-preserving distributed learning methods, enabling flexible accommodation of privacy budgets, cluster heterogeneity, and communication constraints.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Gravitational Federated Clustering (GFC).