Papers
Topics
Authors
Recent
Search
2000 character limit reached

Information-theoretic bounds and phase transitions in clustering, sparse PCA, and submatrix localization

Published 18 Jul 2016 in math.ST, cond-mat.dis-nn, cond-mat.stat-mech, cs.IT, math.IT, math.PR, and stat.TH | (1607.05222v2)

Abstract: We study the problem of detecting a structured, low-rank signal matrix corrupted with additive Gaussian noise. This includes clustering in a Gaussian mixture model, sparse PCA, and submatrix localization. Each of these problems is conjectured to exhibit a sharp information-theoretic threshold, below which the signal is too weak for any algorithm to detect. We derive upper and lower bounds on these thresholds by applying the first and second moment methods to the likelihood ratio between these "planted models" and null models where the signal matrix is zero. Our bounds differ by at most a factor of root two when the rank is large (in the clustering and submatrix localization problems, when the number of clusters or blocks is large) or the signal matrix is very sparse. Moreover, our upper bounds show that for each of these problems there is a significant regime where reliable detection is information- theoretically possible but where known algorithms such as PCA fail completely, since the spectrum of the observed matrix is uninformative. This regime is analogous to the conjectured 'hard but detectable' regime for community detection in sparse graphs.

Citations (76)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.