Papers
Topics
Authors
Recent
Search
2000 character limit reached

Chebyshev polynomials, moment matching, and optimal estimation of the unseen

Published 6 Apr 2015 in math.ST and stat.TH | (1504.01227v2)

Abstract: We consider the problem of estimating the support size of a discrete distribution whose minimum non-zero mass is at least $ \frac{1}{k}$. Under the independent sampling model, we show that the sample complexity, i.e., the minimal sample size to achieve an additive error of $\epsilon k$ with probability at least 0.1 is within universal constant factors of $ \frac{k}{\log k}\log2\frac{1}{\epsilon} $, which improves the state-of-the-art result of $ \frac{k}{\epsilon2 \log k} $ in \cite{VV13}. Similar characterization of the minimax risk is also obtained. Our procedure is a linear estimator based on the Chebyshev polynomial and its approximation-theoretic properties, which can be evaluated in $O(n+\log2 k)$ time and attains the sample complexity within a factor of six asymptotically. The superiority of the proposed estimator in terms of accuracy, computational efficiency and scalability is demonstrated in a variety of synthetic and real datasets.

Citations (107)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 5 likes about this paper.