Curriculum Loss: Robust Learning and Generalization against Label Corruption

Published 24 May 2019 in cs.LG and stat.ML | (1905.10045v3)

Abstract: Deep neural networks (DNNs) have great expressive power, which can even memorize samples with wrong labels. It is vitally important to reiterate robustness and generalization in DNNs against label corruption. To this end, this paper studies the 0-1 loss, which has a monotonic relationship with an empirical adversary (reweighted) risk~\citep{hu2016does}. Although the 0-1 loss has some robust properties, it is difficult to optimize. To efficiently optimize the 0-1 loss while keeping its robust properties, we propose a very simple and efficient loss, i.e. curriculum loss (CL). Our CL is a tighter upper bound of the 0-1 loss compared with conventional summation based surrogate losses. Moreover, CL can adaptively select samples for model training. As a result, our loss can be deemed as a novel perspective of curriculum sample selection strategy, which bridges a connection between curriculum learning and robust learning. Experimental results on benchmark datasets validate the robustness of the proposed loss.

Abstract PDF Upgrade to Chat

Citations (163)

View on Semantic Scholar

Summary

The paper introduces Curriculum Loss (CL) and Noise Pruned Curriculum Loss (NPCL), innovative loss functions designed to enhance deep neural network robustness against label corruption.
The proposed methods leverage classification margins to filter mislabeled samples and can effectively prune noise, providing provably tighter bounds on robustness than standard losses.
Empirical validation shows CL and NPCL outperform state-of-the-art approaches on benchmark datasets, achieving superior robustness and accuracy under high label corruption.

Overview of Curriculum Loss for Robust Learning

The paper presents a novel approach to improving deep neural network (DNN) robustness against label corruption through a new loss function termed Curriculum Loss (CL). Label corruption can arise from various sources, such as errors in annotation, automated data collection, or even intentional manipulation. The authors recognize the inherent expressive power of DNNs, which while beneficial for memorizing complex patterns, can also lead to memorization of incorrect or noisy labels, thereby compromising model generalization.

Key Contributions

Curriculum Loss (CL): The paper introduces CL as a more robust and adaptive alternative to conventional loss functions. CL is structured as a tighter upper bound of the 0-1 loss, effectively optimizing robustness against label errors. It utilizes the classification margin to filter out samples progressively, thereby reducing the influence of mislabeled data during training.
Noise Pruned Curriculum Loss (NPCL): Extending CL, NPCL incorporates inherent mechanisms to prune estimated noisy samples from the dataset. This approach assumes prior knowledge of the noise rate and allows the model to adapt to high levels of label corruption, further distinguishing clean samples for effective learning.
Mathematical Rigor and Efficiency: The architecture of CL and NPCL includes proofs demonstrating their tighter bounding properties compared to traditional surrogate losses. Additionally, the proposed loss functions can be efficiently optimized via a $\mathcal{O}(n\log n)$ algorithm, making them suitable for integration with existing deep learning frameworks, especially in mini-batch processing.

Experimental Validation

The effectiveness of CL and NPCL is empirically validated against benchmark datasets such as MNIST, CIFAR10, and CIFAR100. The results showcase superior robustness and accuracy in the presence of label corruption compared to other state-of-the-art approaches like generalized cross-entropy and co-teaching methods. Notably, NPCL offers a competitive edge without necessitating dual-network architectures, thereby optimizing computational efficiency.

Theoretical Implications

From a theoretical standpoint, this paper contributes to the understanding of loss functions in the context of adversarial risk minimization. By establishing a link between worst-case risk estimates and empirical adversarial risk under label-corrupted distributions, the authors provide a framework that aligns risk optimization with empirical data reality, improving DNN robustness and generalization.

Future Directions

The insights provided by CL and NPCL are poised to influence future research in AI, particularly in the development of more adaptive learning algorithms that can dynamically adjust to noisy environments. Further investigations could explore extensions of CL to address imbalanced datasets or incorporate class-specific diversity measures, offering broader application potential across varied fields like natural language processing and image recognition.

Overall, this paper makes a substantive technical contribution towards systematically addressing the challenge of robust learning in the presence of label corruption, with implications for theoretical advancement and practical application in AI.

Markdown Report Issue