Round-efficient Fully-scalable MPC algorithms for k-Means

Published 1 Apr 2026 in cs.DS | (2604.00954v1)

Abstract: We study Euclidean $k$-Means under the Massively Parallel Computation (MPC) model, focusing on the \emph{fully-scalable} setting. Our main result is a fully-scalable $O((\log n/\log\log n)^{2)$-approximation} in $O(1)$ rounds. Previously, fully-scalable algorithms for $k$-Means either run in super-constant $O(\log\log n \cdot \log\log\log n)$ rounds, albeit with a better $O(1)$-approximation [Cohen-Addad et al., SODA'26], or suffer from bicriteria guarantees [Bhaskara and Wijewardena, ICML'18; Czumaj et al., ICALP'24]. Our algorithm also gives an $O(\log n/\log\log n)$-approximation for $k$-Median, which improves a recent $O(\log n)$-approximation [Goranci et al., SODA'26], and this $o(\log n)$ ratio breaks the fundamental barrier of tree embedding methods used therein. Our main technical contribution is a new variant of the MP algorithm [Mettu and Plaxton, SICOMP'03] that works for general metrics, whose new guarantee is the Lagrangian Multiplier Preserving (LMP) property, which, importantly, holds even under arbitrary distance distortions. Allowing distance distortion is crucial for efficient MPC implementations and useful for efficient algorithm design in general, whereas preserving the LMP property under distance distortion is known to be a significant technical challenge. As a byproduct of our techniques, we also obtain an $O(1)$-approximation to the optimal \emph{value} in $O(1)$ rounds, which conceptually suggests that achieving a true $O(1)$-approximation (for the solution) in $O(1)$ rounds may be a sensible goal for future study.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces a fully scalable, constant-round MPC algorithm achieving O((log n / log log n)^2) and O(log n / log log n) approximations for k-Means and k-Median respectively.
It innovates with a robust, LMP-preserving Mettu-Plaxton variant that maintains performance under arbitrary metric distortions using parallel metric ruling sets.
The approach overcomes classical tree embedding lower bounds, enhancing clustering efficiency on massive datasets with low per-machine memory requirements.

Round-Efficient Fully-Scalable MPC Algorithms for $k$ -Means: A Technical Synthesis

Introduction and Context

This work addresses the design of approximation algorithms for classical $(k,z)$ -clustering—in particular, the $k$ -Means problem ( $z=2$ )—within the Massively Parallel Computation (MPC) model, focusing specifically on algorithms that are fully scalable (i.e., operate with local memory $s = N^\epsilon$ for any constant $\epsilon \in (0,1)$ ) and round-efficient (constant or polylogarithmic rounds). Prior literature has not achieved non-bicriteria, nontrivial constant-round approximation in this regime for $k$ -Means; existing algorithms either exceed constant round complexity (e.g., $O(\log\log n\cdot\log\log\log n)$ rounds for constant-approximation [Cohen-Addad et al., SODA'26]), or trade off for bicriteria solutions or other relaxations.

The technical challenge arises from the incompatibility of classic sequential and parallel techniques with the low-memory, communication-restricted nature of fully-scalable MPC. Lower bounds, intrinsic to both communication and the hardness of simulating sequential ruling set/graph problems in the MPC context, have also exhibited barriers, particularly for ruling set subproblems crucial to primal-dual frameworks.

Contributions and Algorithmic Results

The central contributions are twofold:

A fully-scalable, constant-round MPC algorithm for $k$ -Means and $k$ -Median: The proposed algorithm achieves an $(k,z)$ 0-approximation for $(k,z)$ 1-Means and an $(k,z)$ 2-approximation for $(k,z)$ 3-Median in $(k,z)$ 4 rounds. This advances beyond the prior $(k,z)$ 5 approximation barrier imposed by tree embedding approaches and breaks the lower bound for $(k,z)$ 6-Median in this context.
Theoretical breakthroughs in rounding and Lagrangian Multiplier Preserving (LMP) relaxation: The technical linchpin is a novel variant of the Mettu-Plaxton (MP) algorithm adapted for the facility location relaxation of clustering, which additionally preserves the LMP property even under arbitrary distance distortions—crucial for practical, efficient MPC implementations that rely on approximate (rather than exact) geometric primitives.

The results improve the approximation/round-complexity trade-off in the fully-scalable MPC regime and, for $(k,z)$ 7-Median, surpass the integrality barrier introduced by prior tree embeddings.

Technical Approach

LMP-Robust Mettu-Plaxton Variant

The paper extends the fractional (and integral) MP algorithm to supply an LMP $(k,z)$ 8-approximation under arbitrary metric distortions. Crucially, this encompasses both general metrics and the noisier/geometrically distorted distances typical of MPC-friendly primitives such as approximate range or near-neighbor search.

Key aspects include:

Distance distortion robustness: The analysis shows that the LMP property—essential for transferring facility location solutions to hard cluster-size constraints in $(k,z)$ 9-Means—can be preserved under $k$ 0 factor metric distortion, a property nontrivial to maintain due to the interplay between facility opening and connection cost.
Metric ruling sets: The rounding procedure—converting fractional to integral solutions without violating the cluster-size constraint—relies on Euclidean metric ruling set algorithms, notably leveraging recent progress that provides $k$ 1-round computation in Euclidean spaces [Czumaj et al., 2024].

Fractional Solution via Scalable Primitives

The algorithm is constructed as follows:

Fractional clustering relaxation: The (power- $k$ 2) facility location problem is approximately solved via the robust MP variant, in parallel across guesses for the Lagrange multiplier (facility opening cost). The implementation relies on geometric primitives (approximate range queries).
Rounding to integral solutions: A cost-preserving, separation-inducing sparsification and subsequent rounding pipeline is implemented via parallel metric ruling-set construction and partitioning across different cost scales, eventually producing an integral cluster assignment.
Value estimation in constant rounds: Even when solution rounding is challenging, constant-round $k$ 3-approximation of the optimal value is obtainable—highlighting a value/solution gap reminiscent of related results for MST and facility location in streaming and MPC settings.

Numerical and Theoretical Claims

Approximation guarantees: For $k$ 4-Means ( $k$ 5), the algorithm achieves $k$ 6-approximation in $k$ 7 rounds, and for $k$ 8-Median ( $k$ 9), $z=2$ 0-approximation in $z=2$ 1 rounds. This is the first to surpass the $z=2$ 2 barrier for $z=2$ 3-Median in the fully-scalable setting.
Value-only estimation: By using only the (fractional) MP facility location relaxation, an $z=2$ 4-approximation of the objective value is attainable in $z=2$ 5 rounds, indicating that the remaining hardness for the solution case is intrinsic to rounding rather than the underlying relaxation.
Low-dimension trade-off: When the input is low-dimensional, i.e., $z=2$ 6, the ruling set routine can be instantiated to deliver an $z=2$ 7-approximation even for the integral $z=2$ 8-Means solution, in constant rounds.

Rounding, Euclidean Structure, and the Limitations of Tree-Based Approaches

A central theoretical insight is that classical tree embedding-based approximations cannot yield $z=2$ 9-approximation for $s = N^\epsilon$ 0-Median due to inherent lower bounds on distortion and inapplicability to higher-power cost functions (i.e., $s = N^\epsilon$ 1). The present work's ruling set-based rounding, facilitated by the strong Euclidean structure in the input, yields more powerful parallel routines that both circumvent the combinatorial difficulties of graph-based ruling sets in MPC and break the aforementioned approximation barrier.

Implications and Future Directions

Practical Impact: The results provide for the first time practical, constant-round approximate clustering in the MPC model with arbitrarily small per-machine memory (even $s = N^\epsilon$ 2). This substantially improves the feasibility of performing clustering (especially $s = N^\epsilon$ 3-Means/ $s = N^\epsilon$ 4-Median) on massive, high-dimensional datasets using commodity clusters or in-memory systems built atop MapReduce/Spark abstractions.

Theoretical Impact: Establishing a parallel, round-efficient, and distortion-robust LMP relaxation—and tying its effectiveness to recent metric ruling set routines—suggests possibilities for closing the value/solution gap for other combinatorial optimization problems in MPC (notably MST and facility location). The approach may readily extend to further variants and generalizations (location-constrained, $s = N^\epsilon$ 5-clustering with alternative metrics, etc.).

Research Directions: The most significant open question is whether a true $s = N^\epsilon$ 6-approximation for integral $s = N^\epsilon$ 7-Means clustering in $s = N^\epsilon$ 8 rounds in the general, fully-scalable MPC model is possible. Further advances in parallel geometric primitives (e.g., ruling sets, range queries) and tight LMP-preserving relaxations could bridge the remaining logarithmic gaps.

Conclusion

This paper provides the first nontrivially round-efficient, fully-scalable approximation algorithms for $s = N^\epsilon$ 9-Means and $\epsilon \in (0,1)$ 0-Median clustering in the MPC model. By introducing a robust, LMP-preserving MP scheme and leveraging state-of-the-art Euclidean ruling set routines, it surpasses the tree embedding barrier, achieves polylogarithmic approximations in constant rounds, and establishes sharp value/solution separations. These results open new directions for both parallel sublinear algorithm design and the theoretical underpinnings of parallel combinatorial optimization.