Pseudo-labeling Mechanism (puPL) Survey

Updated 31 January 2026

Pseudo-labeling Mechanism (puPL) is a family of techniques that assigns provisional labels to unlabeled data, extending the effective supervision signal.
puPL employs strategies like hard and soft label assignments, confidence-based selection, and uncertainty estimation to improve model training.
Its applications span semi-supervised, positive-unlabeled, and weakly supervised settings, yielding measurable gains in vision, medical imaging, and speech recognition.

Pseudo-labeling Mechanism (puPL)

The pseudo-labeling mechanism, often abbreviated as "puPL" in the literature, is a family of techniques for leveraging unlabeled data by assigning provisional (pseudo-) labels during training—thereby expanding the effective supervision signal beyond the often scarce annotated portion. Although pseudo-labeling arises across semi-supervised, weakly supervised, positive-unlabeled (PU), and unsupervised learning, the specific technical instantiations of puPL vary significantly depending on application domain, data modality, and the statistical protocols for selection and use of pseudo-labels. The following sections provide a rigorous and comprehensive survey of puPL mechanisms, formal foundations, algorithmic structures, use cases, and theoretical considerations as developed across contemporary research.

1. Formal Foundations and Taxonomy

At its core, pseudo-labeling refers to the process of inferring output labels $\hat{y}$ for unlabeled samples $x$ using predictions from a model $f_\theta$ , and recycling these tuple $(x,\hat{y})$ as if they were ground-truth-labeled, typically with explicit mechanisms to control the confidence of assignments and to avoid the accumulation of error inherent in self-training schemes (Kage et al., 2024, Rizve et al., 2021).

Definition (Pseudo-label): For classifier $f_\theta: \mathcal{X} \to \Delta^k$ , the pseudo-label for $x$ is $y' = Q(x) = f_\theta(x)$ , either in soft (distributional) or hard (one-hot) form (Kage et al., 2024).
General Structure: Given labeled set $(X_L,Y)$ and unlabeled set $X_U$ , pseudo-labeling alternates or schedules supervised updates on $(X_L,Y)$ with unsupervised updates using $(x, y')$ for selected $x \in X_U$ .
Variants: Pseudo-labeling operates in settings ranging from standard semi-supervised learning (SSL) (Nguyen et al., 2022), self-supervised learning (pseudo-labels via clustering or contrastive pretext tasks) (Acharya et al., 2022, Acharya et al., 2024), positive-unlabeled learning (Dorigatti et al., 2022, Yamane et al., 11 Aug 2025), and partial label learning (Saravanan et al., 2024).

This foundational definition underlies all algorithmic strategies surveyed in the subsequent sections.

2. Algorithmic Instantiations

The practical implementation of puPL mechanisms often centers on a repeated process of pseudo-label assignment, sample selection (often gated by confidence or uncertainty), aggregation into augmented labeled sets, and re-training. Significant algorithmic axes include the following:

Aspect	Typical Choices/Approaches	Representative References
Pseudo-label Form	Hard (MAP class), Soft (probabilities), Distributional paths	(Likhomanenko et al., 2022, Rizve et al., 2021, Xu et al., 2023)
Selection Criterion	Confidence, uncertainty, statistical test, Bayesian utility	(Nguyen et al., 2022, Dorigatti et al., 2022, Rodemann, 2023)
Class Assignment	Greedy thresholding, k-NN voting, contrastive clustering	(Acharya et al., 2024, Saravanan et al., 2024, Acharya et al., 2022)
Update Mode	Fixed teacher, EMA/momentum teacher, dynamic re-labeling	(Van et al., 2022, Likhomanenko et al., 2022)
Augmentations	MixUp, CutMix, strong/weak DA, feature perturbation	(Saravanan et al., 2024, Bouayed et al., 2020)
Negative PL Support	Explicit negative labeling, Sinkhorn/OT allocation	(Rizve et al., 2021, Nguyen et al., 2022)

Hard vs. Soft Pseudo-Labels

Traditional puPL pipelines favor hard-label assignment (argmax), but soft-labeling, including per-frame token distributions (Likhomanenko et al., 2022) and Bayesian pseudo-label posteriors (Xu et al., 2023), facilitates richer gradient information but increases the risk of degenerate solutions (e.g., per-frame collapse in ASR).

Selection by Confidence and Uncertainty

Simple approaches sort $x \in X_U$ by $\max_{c} f_\theta(x)_c$ and threshold, but modern mechanisms integrate predictive uncertainty (entropy, mutual information, deep ensembles) (Rizve et al., 2021, Dorigatti et al., 2022). Optimal selection can be formulated as a Bayesian decision problem, where the posterior expected utility for labeling each $x$ is maximized (Rodemann, 2023).

Assignment and Global Constraints

Some approaches treat labeling as an optimal transport (OT) allocation problem across the entire sample pool, stabilizing class proportions and counteracting assignment errors (Nguyen et al., 2022). In positive-unlabeled (PU) or partial-label settings, k-means, k-NN, or clustering in the learned embedding space is often used to separate or augment class boundaries (Acharya et al., 2024, Saravanan et al., 2024).

Negative Pseudo-Labels and Label Smoothing

Recent puPL techniques assign negative pseudo-labels to samples with low confidence and low uncertainty, enhancing signal in multi-label settings and preventing confirmation bias (Rizve et al., 2021). Additionally, label smoothing (Saravanan et al., 2024) and entropy regularization are applied to regularize the pseudo-label distribution.

3. Specialized Mechanisms in Positive-Unlabeled and Weakly-Supervised Settings

Special handling is required in settings where only positive and unlabeled (PU) data, or noisy/partial labels, are present:

Positive-Unlabeled Contrastive Learning: In "Positive Unlabeled Contrastive Learning," pseudo-labels are implemented via probabilistic duplication—each unlabeled example is entered twice, as positive (weight $\pi$ ) and as negative (weight $1-\pi$ ) (Acharya et al., 2022). There is no clustering or explicit pseudo-label selection in this formulation.
Clustering-based Pseudo-Labeling: Alternatively, if embeddings separate well, a two-centroid k-means procedure distinguishes between positive and negative clusters (Acharya et al., 2024), with formal approximation guarantees for minimization of k-means potential.
Noisy Partial Label Learning: In settings with partial or noisy weak labels, SARI (Saravanan et al., 2024) computes pseudo-labels via k-NN voting weighted by feature similarity, applies quantile-based thresholds to select "reliable" pseudo-labeled pairs, and uses mix-up, label smoothing, and consistency regularization in training.

These specialized protocols ensure that pseudo-labeling mechanisms remain robust under ambiguity, class imbalance, and severe weak-label noise.

4. Theoretical Guarantees and Error Control

Theoretical underpinnings are a prominent component in recent puPL work:

Posterior-Predictive Bayesian Selection: The pseudo-posterior predictive criterion derived by maximizing expected joint likelihood under the posterior distribution offers Bayes-optimal pseudo-label selection, mitigating confirmation bias. Analytical approximations via Laplace's method provide tractable scoring functions for robust sample inclusion (Rodemann, 2023).
Uncertainty Calibration: Integration of uncertainty estimates (aleatoric, epistemic) directly correlates selection with prediction confidence, reducing the rate of label noise and confirmation bias, as evidenced in both theory and empirical ablations (Dorigatti et al., 2022, Rizve et al., 2021).
Clustering and k-means Guarantees: In the context of contrastive learning, k-means++ initializations for pseudo-label clusters are proven to yield bounded approximation to the optimal risk, with cluster separation controlling downstream classifier error (Acharya et al., 2024).
EM and Variational Formulations: Pseudo-labeling in segmentation can be interpreted as a block-EM, with successive E-steps via pseudo-label assignment and M-steps via joint training. End-to-end variational optimization of the pseudo-label threshold itself, with KL regularization, eliminates the need for manual hyperparameter selection (Xu et al., 2023).

In all cases, effective puPL design hinges on harmonizing sample selection, uncertainty control, and theoretical (often conservative) inclusion of candidate labels.

5. puPL in Practice: Modalities, Workflows, and Empirical Performance

The workflow and empirical properties of puPL methods are heavily domain-dependent.

Computer Vision/Detection: In image-based 3D object detection (Ma et al., 2022), the classic teacher-student puPL pipeline (train teacher, generate pseudo-labels above a confidence threshold, re-train student) yields dramatic gains—pseudo-labels in some cases enable students to outperform ground-truth-only training with the same data budget.
Medical Image Segmentation: Adaptive, per-image PU learning for pseudo-label selection targets difficult foreground-background discrimination in segmentation, leading to substantial Dice and coverage gains over fixed-threshold baselines, particularly when the label distribution drifts or artifacts are image-specific (Yamane et al., 11 Aug 2025).
Speech Recognition: In end-to-end ASR, frame-wise soft pseudo-labeling with regularization and careful blend of hard and soft losses prevents model collapse and accelerates convergence relative to classic hard-path pseudo-labeling (Likhomanenko et al., 2022).
Self-supervised and Unsupervised Learning: Pseudo-labels assigned via data augmentation, clustering, or contrastive pretexts force invariance in autoencoder representations (Bouayed et al., 2020), improving unsupervised classification accuracy and stability.

Empirical studies consistently report that the addition of robust pseudo-labeling mechanisms yields quantifiable performance improvements across domains and label-scarce scenarios.

6. Challenges, Limitations, and Directions for Future Research

Confirmation Bias and Model Overconfidence: Overly aggressive inclusion of unfiltered pseudo-labels propagates trainer bias and noisy mistakes, especially in regimes with few labeled samples or early overfitting. Bayesian, uncertainty-aware, and global-assignment-based puPL methods directly target these weaknesses (Rizve et al., 2021, Rodemann, 2023, Nguyen et al., 2022).
Hyperparameter Sensitivity: The efficacy of puPL depends on selection thresholds, schedules, and the interaction with supervised/unlabeled batch sizes and losses. Several works propose end-to-end learning or adaptive meta-criteria (e.g., variational threshold networks) to relieve the need for manual tuning (Xu et al., 2023, Yamane et al., 11 Aug 2025).
Scalability and Efficiency: Per-image or per-batch learning of PU classifiers, as in (Yamane et al., 11 Aug 2025), introduces significant computational cost for large datasets. Strategies for scalable, distributed implementations or efficient approximations remain active topics.
Extensibility to Multi-Class and Multi-Label Settings: While the binary case is well-understood, robust puPL adaptation for generic multi-class, multi-label, or hierarchical settings is less developed, particularly when partial, overlapping, or noisy labels are present (Saravanan et al., 2024).
Integration with Self-Supervised and Curriculum Strategies: Recent proposals advocate integrating puPL selection into curriculum learning pipelines, meta-learned schedules, or joint self-supervised–supervised optimization for increased robustness and sample efficiency (Kage et al., 2024).

Continuing research is expected to deliver principled, theoretically-grounded, and generalizable puPL mechanisms compatible with large-scale, complex, and weak-label domains.