Papers
Topics
Authors
Recent
Search
2000 character limit reached

Guided Progressive Label Correction (gPLC)

Updated 27 January 2026
  • gPLC is an iterative framework that alternates model-driven auto-corrections and human validations to progressively clean noisy labels.
  • It employs high-confidence filtering to update annotations permanently, reducing redundant corrections and minimizing human workload.
  • Empirical results in NLP and vision benchmarks demonstrate near-oracle accuracy with significantly lower relabeling effort.

Guided Progressive Label Correction (gPLC) encompasses a class of iterative algorithms for denoising labeled datasets, particularly under nontrivial label noise—including feature-dependent, systematic, or adversarially structured noise—by alternating model-guided and human-in-the-loop interventions. The defining characteristic is a loop in which only high-confidence or high-uncertainty examples are addressed, corrections are retained permanently, and the candidate pool shrinks in each round. This approach is applicable to supervised and semi-supervised problems in NLP, computer vision, and beyond, enabling recovery of near-oracle model performance with substantially less human effort than exhaustive relabeling. Empirical validation covers modular LLM-based systems, vision benchmarks, real-world noisy data, and task-specific contexts.

1. Foundational Principles and Algorithmic Structure

The canonical gPLC framework is realized via three core operations per iteration (Taneja et al., 2024):

1. Auto-correction (Self-Flips): The discriminative model, trained on the current dataset D(t)={(xi,yi(t))}D^{(t)} = \{(x_i, y_i^{(t)})\}, identifies the subset A(t)={i:maxypθ(t)(yxi)δ}A^{(t)} = \{i : \max_y p_{\theta^{(t)}}(y|x_i) \geq \delta\} where model prediction confidence exceeds a high threshold δ\delta. For each iA(t)i \in A^{(t)}, the label is replaced with the model's top prediction yi=argmaxypθ(t)(yxi)y_i^* = \arg\max_y p_{\theta^{(t)}}(y|x_i).

  1. Human-Feedback Correction: Among remaining examples, those with the highest misannotation scores mi(t)=1pθ(t)(yi(t)xi)m_i^{(t)} = 1 - p_{\theta^{(t)}}(y_i^{(t)} | x_i) are flagged (H(t)H^{(t)}, the top MM-fraction by mi(t)m_i^{(t)}), and human annotators provide corrected labels yiHy_i^H.
  2. Filtering: Examples that have been auto-flipped or human-corrected are permanently removed from the candidate pool.

The dataset update is formalized as:

D(t+1)={(xi,yi):iA(t)}{(xi,yiH):iH(t)}{(xi,yi(t)):otherwise}D^{(t+1)} = \{(x_i, y_i^*) : i \in A^{(t)}\} \cup \{(x_i, y_i^H): i \in H^{(t)}\} \cup \{(x_i, y_i^{(t)}): \text{otherwise}\}

with updated labels per example:

yi(t+1)={yiif maxypθ(t)(yxi)δ yiHif iH(t) yi(t)otherwisey_i^{(t+1)} = \begin{cases} y_i^* & \text{if } \max_y p_{\theta^{(t)}}(y|x_i) \geq \delta \ y_i^H & \text{if } i \in H^{(t)} \ y_i^{(t)} & \text{otherwise} \end{cases}

This “one-and-done” principle ensures that examples are processed at most once.

2. Instantiations and Extended Methodologies

Various domain-adapted instantiations of gPLC exist, sharing the above scaffold while introducing modality-specific innovations:

NLP Modular LLM Datasets: ALC³ applies gPLC to noisy GPT-3.5–annotated data, with δ\delta-thresholded auto-flips, active human-assisted correction for top-uncertainty instances, and filtering, demonstrating rapid convergence to near-fine-tuned accuracy with <100% relabeling effort (Taneja et al., 2024).

Vision: ProSelfLC (Wang et al., 2022) employs progressive, entropy-aware self-label correction, where the label update at iteration tt for data point xx is:

y~(t)=(1λ(t)(x))y+λ(t)(x)pT(t)(x)\tilde y^{(t)} = (1-\lambda^{(t)}(x)) y + \lambda^{(t)}(x) p_T^{(t)}(x)

with temperature-scaled predictions pTp_T (T<1T<1 for sharpening), global trust g(t)g(t) (a logistic function of training progress), and local trust (p)\ell(p) (e.g., maximum class confidence or normalized entropy):

λ(t)(x)=g(t)(p(t)(x))\lambda^{(t)}(x) = g(t) \cdot \ell(p^{(t)}(x))

Here, human intervention is replaced by an adaptive trust schedule and regularization is cast as cross-entropy to the updated targets.

Face Recognition and Closed-Set Noise: The RepFace framework (Zhang et al., 2024) integrates early-stage Auxiliary Sample Cleaning (ASC), confident sample filtering, and progressive splitting into “clean,” “ambiguous,” and “noisy” groups with respective training strategies:

  • Clean: standard supervision,
  • Ambiguous: label robust fusion (fusing ground-truth and accumulated model predictions)
  • Noisy: closed-set label smoothing correction interpolating between original and “nearest-negative” labels.

Feature-Dependent Noise and Theoretical Guarantees: The approach of (Zhang et al., 2021) formalizes gPLC for instance-dependent noise, with model-driven label flipping restricted to examples where f(x)½θ|f(x)-½| \geq \theta, with θ\theta gradually lowered as training progresses. This method is provably Bayes-consistent under Poly-Margin Diminishing (PMD) noise conditions.

3. Mathematical Formalism and Theoretical Guarantees

Theoretical analysis establishes the consistency and convergence of gPLC under mild conditions (Zhang et al., 2021):

  • Starting from noisy labels {y~i}\{\tilde y_i\} and an initial classifier f(0)f^{(0)}, the corrected region—where labels agree with the Bayes optimal classifier—expands as only predictions with confidence f(x)12θ|f(x) - \frac{1}{2}| \geq \theta are flipped.
  • Under the PMD condition and suitable schedule for θ\theta, the method guarantees with high probability that the resulting classifier achieves near-Bayes accuracy on all but a vanishing “boundary” region.
  • Progressively relaxing θ\theta grows the clean region, while early rounds restrict flipping to only the purest examples to avoid propagating errors.

A general schema for trust weighting is given by:

λ(t)(x)=g(t)(p(t)(x))\lambda^{(t)}(x) = g(t) \cdot \ell(p^{(t)}(x))

where g(t)g(t) is a monotonically increasing schedule (e.g., logistic), and (p)\ell(p) is a per-sample “entropy confidence.” This guided weighting ensures that early noisy predictions have negligible influence, while later confident predictions dominate.

4. Empirical Results and Comparative Performance

Empirical benchmarking consistently finds that gPLC achieves or surpasses state-of-the-art performance with significantly reduced human labeling cost, regardless of input modality or noise structure (Taneja et al., 2024, Wang et al., 2022, Zhang et al., 2021, Zhang et al., 2024, Yagi et al., 2021, Bäuerle et al., 2018).

  • ATIS: Oracle (fine-tuned) accuracy reached after human review of 27.5% of data (original noise rate: 29.8%).
  • CoNLL: Within 1% of oracle F1 after 55% relabeling (original noise: 57.4%).
  • QNLI: Near-oracle after 15% relabeling (original noise: 15.1%).

Vision Benchmarks:

  • CIFAR-100 with high synthetic noise (Wang et al., 2022): ProSelfLC obtains up to +20 points over CCE, +7 points over Boot-soft under 0.6 symmetric noise.
  • Clothing1M and Food-101N: ProSelfLC and PLC outperform CleanNet, PENCIL, SELFIE, CleanOnly training.
  • Face Recognition (closed-set noise) (Zhang et al., 2024): RepFace achieves SOTA on CASIA-WebFace and MS1MV2 under 20% noise, equaling or surpassing strong baselines (BoundaryFace, RVFace).
  • On hand-object contact prediction, gPLC improves frame-wise accuracy by +2 points and boundary score by +4.5 points over supervised-only learning, and recovers nearly perfect accuracy after heavy synthetic corruption.

5. Human-in-the-Loop Dynamics and Practical Guidelines

Human feedback is administered solely on the top MM-fraction (2–5%) of most uncertain or likely-misannotated examples per iteration. Once an example is corrected—either by auto-flip or human annotation—it is never reconsidered. This progressive narrowing process greatly economizes annotation effort. As corrections accumulate, the model's overall confidence rises, leading to progressively fewer required human queries.

Practical heuristics (Taneja et al., 2024, Bäuerle et al., 2018, Wang et al., 2022):

  • M = 2.5–5% per round is sufficient in high-noise NLP settings.
  • Hard and soft thresholding (e.g., confidence ≥ δ\delta or d(x)>τd(x) > \tau in RepFace) adaptively target the most credible flips.
  • Interleaved retraining anchors model predictions after each round, while task-specific regularization (e.g., temperature-sharpened softmax in ProSelfLC) minimizes entropy in the corrected region.

6. Visualization and Model-Agnostic Extensions

gPLC is compatible with interactive, visual correction loops (Bäuerle et al., 2018), in which classifier-driven error scores (e.g., Class Interpretation Error, Instance Interpretation Error, Similarity Error) are used to rank and present the most suspicious instances to users for batch correction. These cycles leverage confusion matrices, projection plots, and saliency maps to expedite expert decision-making, culminating in high label purity with few iterations.

The underlying logic—progressive, model-guided correction with permanent memory of resolved cases—enables adaptation across domains with heterogeneous data structures and noise models, including tabular, text, sequence, and multi-modal data.

7. Impact, Limitations, and Outlook

gPLC offers a scalable solution for correcting high-noise, large-scale datasets where fully automatic denoising is infeasible and exhaustive human relabeling is intractable. By concentrating both algorithmic and human effort on the most impactful cases at each stage, it delivers near-oracle downstream performance efficiently (Taneja et al., 2024, Wang et al., 2022, Zhang et al., 2024, Zhang et al., 2021, Yagi et al., 2021, Bäuerle et al., 2018).

Potential limitations include reliance on initial model quality—especially in regions of high ambiguity—and the possibility of error propagation if early rounds are insufficiently strict. Nonetheless, the methodology is robust across architectures, domains, and noise typologies, and invitations remain for further theoretical refinements and hybridizations (e.g., integration with meta-weights, co-training, or bi-tempered cross-entropy).


Key References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Guided Progressive Label Correction (gPLC).