Image Classification with Small Datasets: Overview and Benchmark

Published 23 Dec 2022 in cs.CV, cs.AI, and cs.NE | (2212.12478v1)

Abstract: Image classification with small datasets has been an active research area in the recent past. However, as research in this scope is still in its infancy, two key ingredients are missing for ensuring reliable and truthful progress: a systematic and extensive overview of the state of the art, and a common benchmark to allow for objective comparisons between published methods. This article addresses both issues. First, we systematically organize and connect past studies to consolidate a community that is currently fragmented and scattered. Second, we propose a common benchmark that allows for an objective comparison of approaches. It consists of five datasets spanning various domains (e.g., natural images, medical imagery, satellite data) and data types (RGB, grayscale, multispectral). We use this benchmark to re-evaluate the standard cross-entropy baseline and ten existing methods published between 2017 and 2021 at renowned venues. Surprisingly, we find that thorough hyper-parameter tuning on held-out validation data results in a highly competitive baseline and highlights a stunted growth of performance over the years. Indeed, only a single specialized method dating back to 2019 clearly wins our benchmark and outperforms the baseline classifier.

Abstract PDF Upgrade to Chat

Citations (16)

View on Semantic Scholar

Summary

The paper presents a novel benchmark and taxonomy to evaluate image classification methods on small datasets.
It shows that hyper-parameter optimization is critical, with well-tuned cross-entropy baselines outperforming most specialized methods.
It categorizes techniques into architecture, cost function, data augmentation, latent augmentation, and warm-starting, guiding future research.

Image Classification with Small Datasets: Overview and Benchmark

This essay synthesizes the key contributions and findings of "Image Classification with Small Datasets: Overview and Benchmark" (2212.12478), focusing on its implications for practitioners in the field of data-efficient deep learning. The paper addresses the pressing need for a systematic analysis of existing methods and a standardized benchmark for image classification under data scarcity.

Core Contributions

The paper's primary contributions are twofold: a comprehensive review of existing literature and the introduction of a novel benchmark for image classification with small datasets. The literature review categorizes existing methods into five main families: architecture, cost function, data augmentation, latent augmentation, and warm-starting. This taxonomy provides a structured overview of the research landscape. The benchmark comprises five datasets spanning diverse domains and data types, including RGB, grayscale, and multispectral imagery, thereby addressing the limitations of existing benchmarks that primarily focus on natural images.

Benchmark and Evaluation

The authors conduct a thorough empirical evaluation of ten existing methods, alongside a standard cross-entropy baseline, across the proposed benchmark. A key aspect of their evaluation is the careful hyper-parameter optimization (HPO) performed for each method and dataset individually. This rigorous approach reveals a somewhat disillusioning truth: many specialized methods fail to outperform a well-tuned cross-entropy baseline. Only one method, dating back to 2019, consistently surpasses the baseline. (Figure 1)

(Figure 1)

Figure 1: Accuracy of state-of-the-art methods and baselines on the proposed benchmark, highlighting the limited performance progress over the years.

The authors also find that an untuned baseline, trained with default hyper-parameters, significantly underperforms the HPO-tuned baseline, underscoring the critical importance of proper hyper-parameter optimization in data-deficient scenarios. This suggests that the perceived progress in the field may be, in part, an illusion caused by comparisons against insufficiently tuned baselines.

Taxonomy of Methods

The paper categorizes existing methods into five families, depending on how the model is regularized:

Architecture: Modifications to network architectures such as fixed filters based on wavelet transformations or discrete cosine transform
Cost function: Regularization or penalty terms such as cosine similarity
Data augmentation: Increases the size of the training dataset
Latent augmentation: Stochastic or adversarial transformations applied to features inside networks
Warm-starting: Algorithmic schemes to initialize the classifier with weights that favor better learning on small datasets.

(Figure 2)

Figure 2: Distribution of publication venues concerning the reviewed body of literature, revealing the dominance of computer vision conferences.

\subsection{Practical Implications}

The findings of this paper have significant practical implications for researchers and practitioners working on image classification with small datasets.

Importance of HPO: The study highlights the critical role of careful hyper-parameter optimization. Practitioners should prioritize tuning their models, as this can lead to substantial performance gains, even with standard techniques like cross-entropy loss.
Choice of Baseline: When evaluating novel methods, it is essential to compare against well-tuned baselines. This ensures that observed improvements are genuine and not simply due to the new method being better optimized than the comparison methods.
Domain Awareness: The benchmark includes datasets from different domains, emphasizing the importance of considering domain-specific characteristics when developing and evaluating methods.
Figure 3: Visualization of relative connections within the reported body of literature, showcasing the increasing interest in the domain and the low connectivity in terms of references.

Future Directions

The paper identifies several promising directions for future research:

Development of new regularization techniques: The limited success of existing specialized methods suggests a need for more effective regularization strategies specifically designed for data-deficient scenarios.
Exploration of novel architectures: Although the benchmark focuses on a common base architecture, exploring alternative architectures tailored for small datasets could lead to further performance gains.
Development of automated HPO strategies: Given the importance of HPO, developing automated techniques that can efficiently find optimal hyper-parameter configurations would be highly valuable.

Conclusion

"Image Classification with Small Datasets: Overview and Benchmark" (2212.12478) provides a valuable contribution to the field of data-efficient deep learning. Its rigorous evaluation methodology and the establishment of a challenging benchmark provide a solid foundation for future research. The paper's findings underscore the importance of careful hyper-parameter optimization and highlight the need for new approaches that can effectively address the challenges of image classification with limited data.

Markdown Report Issue