Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Published 25 Aug 2017 in cs.LG, cs.CV, and stat.ML | (1708.07747v2)

Abstract: We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits. The dataset is freely available at https://github.com/zalandoresearch/fashion-mnist

Abstract PDF Upgrade to Chat

Citations (8,146)

View on Semantic Scholar

Summary

The paper introduces Fashion-MNIST, a novel dataset offering a more challenging alternative to MNIST for testing machine learning algorithms.
It details a standardized conversion pipeline that transforms Zalando fashion images into 28x28 grayscale formats while preserving key class distinctions.
Experimental benchmarks reveal significant performance drops across classifiers, underscoring the dataset's complexity and realistic challenges.

Fashion-MNIST: Benchmarking Machine Learning Algorithms with a Novel Image Dataset

Introduction

The "Fashion-MNIST" dataset has been presented as an innovative alternative to the widely-utilized MNIST dataset of handwritten digits. Recognizing the limitations of MNIST due to its simplicity and the high accuracy levels already achieved by modern deep learning models on this dataset, Fashion-MNIST aims to offer a more challenging benchmark for assessing machine learning algorithms. By retaining the format and size characteristics of MNIST, this new dataset facilitates seamless integration for benchmarking purposes, while introducing complexity and diversity through fashion product imagery.

Fashion-MNIST Dataset

Fashion-MNIST comprises 70,000 grayscale images categorized into ten distinct classes, with each class containing 7,000 images. These fashion classes include items such as T-shirts, trousers, dresses, and sandals, representing diverse fashion elements. Images are meticulously processed to maintain a uniform $28 \times 28$ pixel resolution, identical to the MNIST format, which enables researchers to effortlessly adapt to this dataset without additional preprocessing overhead.

The dataset's organization into the classic training-testing split—60,000 images for training and 10,000 for testing—is consistent with MNIST. This ensures that researchers can utilize existing MNIST-based workflows and frameworks with minimal modifications. The conversion pipeline for deriving Fashion-MNIST from original Zalando online product images includes resampling and standardizing while preserving essential visual traits and class distinctions. This processing includes Gaussian sharpening, grayscale conversion, and negation of intensity values, enhancing the dataset's robustness as a benchmark tool.

Experimental Benchmarks

With the aim of establishing robust benchmarks, various machine learning classifiers were evaluated on Fashion-MNIST, highlighting the dataset's increased challenge as opposed to MNIST. Algorithms such as SVM, k-NN, Random Forest, and others were tested, showing varied performance metrics when applied to Fashion-MNIST compared to MNIST. For example, while traditional classifiers like Decision Tree and GaussianNB performed moderately on Fashion-MNIST, advanced methods such as SVC with polynomial and RBF kernels demonstrated higher accuracy. This indicates a significant variance in classifier performance between MNIST and Fashion-MNIST, which can drive algorithmic improvements and innovation.

Some algorithms achieved test accuracies consistently lower on Fashion-MNIST than MNIST, underscoring its utility as a more demanding benchmark. For example, SVC with a polynomial kernel shows an accuracy drop from MNIST's 97.8% to Fashion-MNIST's 89.1%, revealing additional complexity and variability intrinsic to fashion images.

Implications and Future Developments

Fashion-MNIST's introduction enhances benchmarking diversity, challenging researchers to refine algorithms to achieve higher performance on a dataset reflecting real-world, visual diversity. By providing detailed experimental results and pathways for utilizing this dataset, the development trajectory can include adapting more sophisticated models capable of capturing complex patterns inherent in fashion images.

The accessibility and utility of Fashion-MNIST highlight the potential for developing further specialized datasets that align with industry needs, such as healthcare or automotive imagery, and drive innovations in visual recognition tasks. As the machine learning community continues to optimize algorithms on these diverse datasets, the evolution of deep learning models will likely encompass increasingly sophisticated abstractions, tailored feature extraction methods, and enhanced neural network architectures.

Conclusion

Fashion-MNIST stands as a valuable resource in the quest for advanced machine learning model evaluation, systematically addressing the simplicity of the incumbent MNIST dataset. It provides a ready framework for researchers eager to challenge their models against realistic scenarios emulating fashion industry applications, further propelling advancements in algorithmic accuracy and generalization capabilities. Through continuous utilization and evaluation of machine learning algorithms on Fashion-MNIST, this dataset sets a precedent for creating domain-specific benchmarks that facilitate progress in AI and deep learning research.