Papers
Topics
Authors
Recent
Search
2000 character limit reached

Approximate kernel clustering

Published 29 Jul 2008 in cs.DS, cs.CC, and math.FA | (0807.4626v2)

Abstract: In the kernel clustering problem we are given a large $n\times n$ positive semi-definite matrix $A=(a_{ij})$ with $\sum_{i,j=1}na_{ij}=0$ and a small $k\times k$ positive semi-definite matrix $B=(b_{ij})$. The goal is to find a partition $S_1,...,S_k$ of ${1,... n}$ which maximizes the quantity $$ \sum_{i,j=1}k (\sum_{(i,j)\in S_i\times S_j}a_{ij})b_{ij}. $$ We study the computational complexity of this generic clustering problem which originates in the theory of machine learning. We design a constant factor polynomial time approximation algorithm for this problem, answering a question posed by Song, Smola, Gretton and Borgwardt. In some cases we manage to compute the sharp approximation threshold for this problem assuming the Unique Games Conjecture (UGC). In particular, when $B$ is the $3\times 3$ identity matrix the UGC hardness threshold of this problem is exactly $\frac{16\pi}{27}$. We present and study a geometric conjecture of independent interest which we show would imply that the UGC threshold when $B$ is the $k\times k$ identity matrix is $\frac{8\pi}{9}(1-\frac{1}{k})$ for every $k\ge 3$.

Citations (39)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.