Gravitational Federated Clustering (GFC)
- Gravitational Federated Clustering (GFC) is a framework that identifies cluster centers by detecting persistent singularities in a synthesized gravitational potential field from locally privatized centroids.
- It integrates client-side ε-LDP with compactness-aware mass encoding to mitigate noise and heterogeneity issues typical in traditional distance-based federated clustering.
- Empirical evaluations demonstrate significant improvements in clustering accuracy—up to 800% in ARI and NMI—highlighting GFC’s scalability and robustness under strict privacy constraints.
Gravitational Federated Clustering (GFC) is a framework for privacy-preserving cluster analysis in heterogeneous federated settings characterized by non-IID data and local differential privacy (LDP) constraints. GFC reformulates the global clustering problem as the identification of topologically persistent singularities in a synthetic gravitational potential field constructed from locally privatized, compactness-weighted centroids. This approach addresses fundamental limitations of conventional distance-based federated clustering, most notably their susceptibility to LDP-induced noise and sensitivity to data heterogeneity, and enables one-shot, non-iterative aggregation under stringent privacy constraints (Long et al., 30 Nov 2025).
1. Origin and Physical Motivation
GFC is rooted in the notion of "gravitational clustering" (GC), which models each data point as a mass exerting an attractive force in feature space, leading to emergent clusters as zones of high mass concentration (Binder et al., 2017). In distributed settings, GC has been realized by injecting mobile mass units that are attracted to fixed data points and undergo merge operations upon coalescent proximity, enabling adaptive clustering and cluster enumeration without explicit knowledge of cluster count. GC has shown empirical robustness to outliers, convergence in steps per cluster, and near-linear scalability (Binder et al., 2017).
GFC extends this paradigm to federated learning with the introduction of privacy-preserving mechanisms and a topological aggregation phase, leveraging synthetic gravitational potential fields rather than dynamical particle simulations (Long et al., 30 Nov 2025).
2. Client-Side Compactness-Aware ε-LDP Mechanism
Each federated client with private dataset applies per-record -LDP via the Laplace mechanism. Record-level noise , where , is added to each coordinate: . On this privatized dataset , the client executes -means clustering (with possibly predetermined or adaptively chosen) to yield local clusters with centroids .
To encode local cluster compactness, each centroid is assigned a mass
where is the empirical variance of pairwise distances in . The mass thus decays with intra-cluster variance, giving higher weight to more tightly clustered structures. This pair is transmitted to the server. Owing to the composition property of the Laplace mechanism, the set of centroids per client as a whole satisfies -LDP at the client level (Long et al., 30 Nov 2025).
3. Construction of the Gravitational Potential Field
Upon collection of privatized, mass-weighted centroids , the server synthesizes a continuous gravitational potential field over a probe set of uniformly sampled points within the bounding box of . The field at probe is defined as
with (Coulomb-like), and as a regularization constant to avoid singularities. Dense, heavily weighted clusters yield prominent peaks in , representing candidate global cluster centers. This approach eliminates direct reliance on noisy pairwise distances, instead aggregating global structure via the potential field formalism (Long et al., 30 Nov 2025).
4. Topological Aggregation via Persistent Homology
The extraction of robust cluster centroids from the noisy potential field leverages persistent homology on superlevel sets. For descending thresholds from to , the superlevel set is constructed. The $0$th homology , representing the connected components via an -neighborhood graph on , is computed for each .
Persistent homology tracks the birth and death of these $0$-dimensional features. True clusters correspond to components with large persistence intervals (significant death–birth gaps). For each persistent component (leaf in the merge tree ), an energy-weighted centroid is calculated: The global centroids are then given by the leaves with the highest persistence values. This topological filtering enables the identification of stable, noise-resistant cluster centers, overcoming the adverse effects of privatization noise and non-IID heterogeneity (Long et al., 30 Nov 2025).
5. Theoretical Properties: Privacy–Error Bounds and Noise Attenuation
GFC achieves a provable trade-off between privacy and accuracy. The error in the estimated global centroid relative to the true centroid can be decomposed into positional noise , arising from the Laplace mechanism, and mass perturbation noise . The topological aggregation stage filters out most mass-based fluctuations, so that
where depends on data dimensionality and cluster size. This closed-form bound is derived by tracking how local centroid shifts aggregate through the persistent homology filtration (Long et al., 30 Nov 2025).
The gravitational potential field satisfies a global Lipschitz condition, with constant
ensuring that for , . In high-density regions, local noise in or is exponentially attenuated: a perturbation shifts by at most , where is the number of points in the cluster. This smoothing property fundamentally differentiates GFC from distance-based methods under strong LDP (Long et al., 30 Nov 2025).
6. Empirical Evaluation and Comparative Performance
GFC has been benchmarked against K-Fed, MUFC, NN-FC (one-shot baselines), and iterative DP-Lloyds methods on ten datasets with varying degrees of client count and privacy budgets. Under strong privacy (), GFC yields Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI) improvements of $30$– over baselines. For example, on MNIST at , GFC achieves , while competing methods yield near-zero ARI.
Ablation studies confirm the necessity of the mass encoding: omitting (1) drops ARI by . Robustness of the topological phase is verified by varying and the probe point multiplier ; stable centroids consistently emerge across wide ranges. Scalability to $1000$ clients demonstrates that GFC is uniquely able to preserve coherent clusters under large-scale, high-privacy regimes (Long et al., 30 Nov 2025).
7. Related Approaches and Extensions
The gravitational paradigm in clustering—treating data as mass in feature space—originates with Binder et al.'s Gravitational Clustering (GC), which emphasizes adaptive clustering, cluster enumeration, and decentralized diffusion–adaptation schemes in wireless sensor networks. GC features mobile mass units subject to Newtonian or generalized attractive forces, periodic fusion, and robust local leadership via mass thresholding (Binder et al., 2017). GFC's topological extension and privacy-preserving formulation generalize these ideas for federated, LDP-privatized, non-IID data, replacing mobile-unit dynamics with potential fields and persistent homology on synthetic probe sets (Long et al., 30 Nov 2025).
A plausible implication is that GFC's structural separation of local (privacy-constrained) summarization and global (physics-inspired, topology-driven) aggregation may serve as a blueprint for future privacy-preserving distributed learning methods, enabling flexible accommodation of privacy budgets, cluster heterogeneity, and communication constraints.