Detecting and Rectifying Noisy Labels: A Similarity-based Approach
Abstract: Label noise in datasets could damage the performance of neural net training. As the size of modern deep networks grows, there is a growing demand for automated tools for detecting such errors. In this paper, we propose post-hoc, model-agnostic error detection and rectification methods utilizing the penultimate feature from a neural network. Our idea is based on the observation that the similarity between the penultimate feature of a mislabeled data point and its true class data points is higher than that for data points from other classes, making the probability of label occurrence within a tight, similar cluster informative for detecting and rectifying errors. Extensive experiments show our method not only demonstrates high performance across various noises but also automatically rectifies these errors to improve the quality of datasets.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.