Annotation Discrepancy & Negative Learning
- The paper proposes a dynamic adaptive thresholding mechanism to partition confident and non-confident samples in FER, significantly mitigating annotation noise.
- It introduces a negative class consistency loss with dual classifier heads to harness information from non-confident examples for robust learning.
- Empirical evaluations on RAF-DB and FERPlus demonstrate substantial accuracy gains across symmetric and asymmetric noise regimes.
Annotation discrepancy refers to inconsistencies or inaccuracies in annotation labels, particularly prevalent in tasks such as facial expression recognition (FER) where subjectivity, image quality, and annotator interpretation contribute to label noise. Negative learning, in this context, is an approach leveraging information about incorrect or negative class assignments to guide robust learning, especially in the presence of noisy annotations. Collectively, these concepts form the methodological foundation for enhancing robustness against annotation noise in large-scale supervised learning settings.
1. Annotation Discrepancy in Facial Expression Recognition
FER datasets are especially susceptible to annotation discrepancies due to subjective interpretation of facial expressions and varying image clarity. Noisy annotations arise inherently, resulting in label errors that degrade the performance of standard supervised learning pipelines. The prevalence and severity of such noise demand targeted strategies beyond conventional loss minimization, particularly as dataset sizes and class counts increase. Unlike synthetic settings, real-world annotation discrepancy is systematic rather than uniformly random, complicating the estimation and mitigation of its impact (Gera et al., 2023).
2. Dynamic Adaptive Thresholding for Confidence Partitioning
To address the challenge of annotation discrepancy, a dynamic adaptive thresholding mechanism is proposed. For a dataset with expression classes and a mini-batch , the per-class adaptive threshold is established as the running average confidence of the model on weak augmentations of samples annotated with class : where is the set of indices with and is the predicted probability for class on . At each iteration, indices are partitioned into "confident" (where model confidence exceeds ) and "non-confident" samples. This batch-local, adaptive mechanism sidesteps the need for manual or momentum-based threshold tuning. Its statistical foundation allows it to differentiate heavily mislabeled samples from reliable annotations without requiring explicit knowledge of the noise rate (Gera et al., 2023).
3. Negative Class Consistency Loss
For samples flagged as non-confident, discarding them neglects potentially valuable information. The negative class consistency strategy introduces a secondary classifier head focused on negative class predictions. It outputs probabilities over classes (all but the ground-truth label). Consistency is enforced on the top- negative classes—those with the highest predicted probabilities—across weak and strong data augmentations: where selects top- negative classes for each . This loss encourages the network to remain consistent in its rejection of specific non-ground-truth classes, exploiting the observation that the least probable classes under noise remain stable between augmentations. By integrating negative learning on non-confident examples, the model improves discrimination even with unreliable positive supervisions, systematically exploiting the high prior probability of correct negative identification in multiclass FER datasets (Gera et al., 2023).
4. Combined Objective and Training Workflow
The overall training loss is a sum of the confident-positive cross-entropy and the negative class consistency loss: with controlling the balance (empirically is robust). Training protocol consists of a warm-up phase (using all examples as confident), after which each batch is dynamically partitioned using adaptive thresholds. The backbone is a pre-trained ResNet-18 with dual classifier heads (positive, negative). Augmentation strategies combine weak (random crop and flip) and strong (RandAugment) transformations. Negative class consistency operates on the four () most probable negative classes, with all implementation details precisely specified in the framework (Gera et al., 2023).
5. Empirical Evaluation under Annotation Noise
NCCTFER is extensively validated under both symmetric (random class-flip at rates up to 80%) and asymmetric (confusion-pair swaps) label noise regimes on RAF-DB and FERPlus benchmarks. In both scenarios, the method yields substantial improvements in test accuracy compared to recent baselines (SCN, RUL, EAC):
| Noise | SCN | RUL | EAC | NCCTFER (Ours) |
|---|---|---|---|---|
| 10% | 84.28 | 85.94 | 87.03 | 86.29 (FERPlus) |
| 60% | 68.06 | 73.54 | 79.82 | 80.20 (FERPlus) |
| 80% | 37.62 | 43.39 | 62.19 | 68.03 (FERPlus) |
Qualitative analyses using t-SNE and Grad–CAM reveal tighter intra-class clustering and sharper model attention on facial regions under noise. Confidence plots demonstrate recovery of true class predictions, even at high noise fractions. On asymmetric RAF-DB noise (30%), a +4.52% improvement over the baseline is observed. The framework achieves 4–28% accuracy improvement on RAF-DB and 3.3–31.4% on FERPlus across noise regimes, requiring no prior knowledge of the noise rate or additional networks (Gera et al., 2023).
6. Significance and Broader Implications
Annotation discrepancy and negative learning methodologies exemplified by NCCTFER represent a principled advance for learning in the presence of noisy labels. Adaptive thresholding and negative class consistency formalize intrinsically robust model behaviors, especially where negative evidence is statistically easier to extract than positive identification. A plausible implication is that similar dual-head architectures and loss partitionings could be extended to other domains affected by annotation noise, provided that negative class stability under transformations persists. The absence of noise rate dependency broadens applicability to diverse, real-world noisy datasets. These techniques significantly improve both quantitative generalization and qualitative representational structure under challenging supervision conditions (Gera et al., 2023).