Papers
Topics
Authors
Recent
Search
2000 character limit reached

Training on test data: Removing near duplicates in Fashion-MNIST

Published 19 Jun 2019 in cs.LG, cs.CV, and stat.ML | (1906.08255v1)

Abstract: MNIST and Fashion MNIST are extremely popular for testing in the machine learning space. Fashion MNIST improves on MNIST by introducing a harder problem, increasing the diversity of testing sets, and more accurately representing a modern computer vision task. In order to increase the data quality of FashionMNIST, this paper investigates near duplicate images between training and testing sets. Near-duplicates between testing and training sets artificially increase the testing accuracy of machine learning models. This paper identifies near-duplicate images in Fashion MNIST and proposes a dataset with near-duplicates removed.

Citations (3)

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.