Papers
Topics
Authors
Recent
Search
2000 character limit reached

Assessing Dataset Quality Through Decision Tree Characteristics in Autoencoder-Processed Spaces

Published 27 Jun 2023 in cs.LG and cs.AI | (2306.15392v1)

Abstract: In this paper, we delve into the critical aspect of dataset quality assessment in machine learning classification tasks. Leveraging a variety of nine distinct datasets, each crafted for classification tasks with varying complexity levels, we illustrate the profound impact of dataset quality on model training and performance. We further introduce two additional datasets designed to represent specific data conditions - one maximizing entropy and the other demonstrating high redundancy. Our findings underscore the importance of appropriate feature selection, adequate data volume, and data quality in achieving high-performing machine learning models. To aid researchers and practitioners, we propose a comprehensive framework for dataset quality assessment, which can help evaluate if the dataset at hand is sufficient and of the required quality for specific tasks. This research offers valuable insights into data assessment practices, contributing to the development of more accurate and robust machine learning models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (7)
  1. Deng, L. (2012). The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine, 29(6):141–142.
  2. Novel dataset for fine-grained image categorization. In First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO.
  3. Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto, Ontario.
  4. Canine age classification using deep learning as a step toward preventive medicine in animals. In 2022 17th Conference on Computer Science and Intelligence Systems (FedCSIS), pages 169–172.
  5. Wu, J. (2017). Tiny imagenet challenge.
  6. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.
  7. Automatic estimation of dog age: The dogage dataset and challenge. In Tetko, I. V., Kůrková, V., Karpov, P., and Theis, F., editors, Artificial Neural Networks and Machine Learning – ICANN 2019: Image Processing, pages 421–426. Springer International Publishing.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.