Papers
Topics
Authors
Recent
Search
2000 character limit reached

Spectral Clustering for Discrete Distributions

Published 25 Jan 2024 in cs.LG, cs.AI, and stat.ML | (2401.13913v2)

Abstract: The discrete distribution is often used to describe complex instances in machine learning, such as images, sequences, and documents. Traditionally, clustering of discrete distributions (D2C) has been approached using Wasserstein barycenter methods. These methods operate under the assumption that clusters can be well-represented by barycenters, which is seldom true in many real-world applications. Additionally, these methods are not scalable for large datasets due to the high computational cost of calculating Wasserstein barycenters. In this work, we explore the feasibility of using spectral clustering combined with distribution affinity measures (e.g., maximum mean discrepancy and Wasserstein distance) to cluster discrete distributions. We demonstrate that these methods can be more accurate and efficient than barycenter methods. To further enhance scalability, we propose using linear optimal transport to construct affinity matrices efficiently for large datasets. We provide theoretical guarantees for the success of our methods in clustering distributions. Experiments on both synthetic and real data show that our methods outperform existing baselines.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 1 like about this paper.