Papers
Topics
Authors
Recent
Search
2000 character limit reached

K-bMOM: a robust Lloyd-type clustering algorithm based on bootstrap Median-of-Means

Published 10 Feb 2020 in stat.ME and stat.ML | (2002.03899v2)

Abstract: We propose a new clustering algorithm that is robust to the presence of outliers in the dataset. We perform Lloyd-type iterations with robust estimates of the centroids. More precisely, we build on the idea of median-of-means statistics to estimate the centroids, but allow for replacement while constructing the blocks. We call this methodology the bootstrap median-of-means (bMOM) and prove that if enough blocks are generated through the bootstrap sampling, then it has a better breakdown point for mean estimation than the classical median-of-means (MOM), where the blocks form a partition of the dataset. From a clustering perspective, bMOM enables to take many blocks of a desired size, thus avoiding possible disappearance of clusters in some blocks, a pitfall that can occur for the partition-based generation of blocks of the classical median-of-means. Experiments on simulated datasets show that the proposed approach, called K-bMOM, performs better than existing robust K-means based methods. Guidelines are provided for tuning the hyper-parameters K-bMOM in practice. It is also recommended to the practitionner to use such a robust approach to initialize their clustering algorithm. Finally, considering a simplified and theoretical version of our estimator, we prove its robustness to adversarial contamination by deriving robust rates of convergence for the K-means distorsion. To our knowledge, it is the first result of this kind for the K-means distorsion.

Citations (12)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.