Semi-Supervised Learning Empirical Likelihood Estimator

Updated 29 January 2026

The paper introduces SSLEMLE, which integrates labeled and unlabeled data via empirical likelihood to improve consistency, efficiency, and robustness.
It details multiple frameworks, including penalized GLMs and mixture models, that demonstrate significant performance gains over supervised-only methods.
The estimator offers rigorous theoretical guarantees, such as asymptotic normality and semiparametric efficiency, with practical algorithms ensuring faster convergence.

A Semi-Supervised Learning Empirical Maximum Likelihood Estimator (SSLEMLE) is any estimator that leverages the empirical likelihood or maximum likelihood principle to incorporate both labeled and unlabeled data for statistical estimation or prediction. SSLEMLEs have emerged as a central tool in semi-supervised inference, encompassing both parametric and nonparametric models. Key methods span empirical likelihood stacking (Wang et al., 18 Dec 2025), penalized GLM frameworks (Laria et al., 2020), exponential-tilt mixture models (Tian et al., 2023), contrastive-pessimistic approaches (Loog, 2015), and extensions to unmatched data regression (Balabdaoui et al., 27 Jan 2026). The common goal is to improve estimation efficiency, robustness, and inference quality compared to supervised-only methods, typically with guarantees on consistency, variance, and asymptotic optimality.

1. Core Principles and Mathematical Formulation

SSLEMLE approaches construct loss functions or likelihoods that explicitly blend labeled and unlabeled observations, often via moment constraints, weighted likelihood ratios, or pseudo-likelihood terms. In its archetypal empirical likelihood form (Wang et al., 18 Dec 2025), SSLEMLE solves: $\max_{\{w_k\}}\ \prod_{k=1}^{n+N}w_k,\quad \text{subject to}\quad w_k\ge0,\ \sum_k w_k=1,\ \sum_k w_k\,u_k(\theta)=0$ with composite moment functions

$u_k(\theta) = \begin{cases} g(X_k,Y_k;\theta), & k \le n \ h^c(X_k, m(X_k)), & k > n \end{cases}$

where $g$ is a supervised score, $h^c$ centers auxiliary moments built from model predictions $m(x)$ , and $w_k$ are empirical likelihood weights. The estimator $\hat\theta = \arg\max_\theta \ell(\theta)$ solves the optimal profile likelihood over parameter space.

Other frameworks utilize penalized maximum likelihood for GLMs with surrogate terms on transformed unlabeled features, e.g. s²net (Laria et al., 2020): $\mathcal L(\beta) = R(y_\ell|X_\ell;\beta) + \gamma_1 R(\bar y_\ell \mathbf{1}|T; \beta) + \lambda_1\|\beta\|_1 + \tfrac{\lambda_2}{2}\|\beta\|_2^2$ Here, $\gamma_1$ modulates the influence of unlabeled-pseudo samples, and $T$ encodes geometric structure extracted from $X_u$ via SVD and centering.

For mixture models and classification, SSLEMLEs can target maximization over the joint labeled-unlabeled likelihood under exponential-tilt constraints, nonparametric support, and label-shift adjustment (Tian et al., 2023): $\ell(\beta, \pi_\ell, \pi_u, G_0) = \ \text{(labeled)} + \text{(unlabeled mixture)} + \text{profile over }G_0$ The MCPL estimator for likelihood-based classifiers contrasts improvement over supervised solutions and takes a pessimistic (worst-case soft labeling) to guarantee log-likelihood monotonicity (Loog, 2015).

2. Optimization Algorithms and Implementation

Optimization of SSLEMLEs varies depending on problem structure:

Empirical likelihood stacking: Profile likelihood solved via Lagrange multipliers $\lambda$ ; closed-form weights $w_k(\theta) = \frac{1}{n+N}[1+\lambda(\theta)^\top u_k(\theta)]^{-1}$ , with $\lambda(\theta)$ solving the stacked moment equations (Wang et al., 18 Dec 2025).
Penalized GLM (s²net): Global convex minimization using FISTA, incorporating both soft-thresholding for sparsity (ℓ₁) and ridge scaling (ℓ₂), with gradient and quadratic surrogate backtracking (Laria et al., 2020). Updates have explicit forms amenable to efficient iterative solution.
Exponential-tilt mixture models: Saddle-point profile likelihood in $(\beta, \pi_\ell, \pi_u, \alpha)$ , alternating between E-step estimates for nonparametric masses and Newton–Raphson/M-step for parameters (Tian et al., 2023).
Contrastive-Pessimistic (MCPL): Alternating maximization over $\theta$ and worst-case minimization over soft-labels $q_{ki}$ ; projected gradient methods for the $q$ -updates, closed-form θ updates in exponential-family models (Loog, 2015).
Unmatched regression: Direct maximization of aggregated likelihoods via line-search/Newton steps with gradient and Hessian easily computable from component summations (Balabdaoui et al., 27 Jan 2026).

3. Statistical Properties and Efficiency Bounds

SSLEMLEs uniformly provide rigorous statistical guarantees:

Consistency and asymptotic normality: For empirical likelihood stacking and unmatched MLEs, estimates satisfy central limit theorem type distributions, e.g. $\sqrt{n}(\hat\theta-\theta^*)\dto \mathcal N(0,V_h)$ (Wang et al., 18 Dec 2025) and $\sqrt{n+m}(\hat\beta_{n,m}-\beta_0)\to N(0, \Sigma_{\rm SSL})$ (Balabdaoui et al., 27 Jan 2026).
Semiparametric efficiency: The empirical likelihood estimator attains the efficiency bound if auxiliary moments $h$ span the “predictable” score component (Wang et al., 18 Dec 2025). In mixture models, the semi-supervised estimator is strictly more efficient than supervised logistic under label-shift (Tian et al., 2023).
Guarantees on monotonicity: MCPL estimators guarantee that the semi-supervised solution is never worse, and often strictly better, in labeled log-likelihood than the supervised estimator, independent of cluster assumptions (Loog, 2015).
Finite-sample volume gain: In unmatched regression, explicit volume-ratio $G$ formulas quantify the improvement over classical MLE, scaling as $O(1/\sqrt{\lambda})$ in the large-unlabeled regime (Balabdaoui et al., 27 Jan 2026).

4. Paradigms in Application Domains

The SSLEMLE framework is applicable across numerous statistical learning scenarios:

Domain	Typical SSLEMLE Objective	Representative Formulation
Regression (GLM, linear, unmatched)	Penalized loss plus pseudo-likelihood, or mixture likelihood for unmatched data	$R(y_\ell\|X_\ell;\beta) + \gamma_1 R(\bar y_\ell \mathbf{1}\|T;\beta) + \lambda_1\\|\beta\\|_1 + \lambda_2\\|\beta\\|_2^2$ (Laria et al., 2020); $\ell_{n,m}(\beta)$ (Balabdaoui et al., 27 Jan 2026)
Classification (mixture, exponential-tilt)	Stacked likelihood for labeled/unlabeled, with explicit mixture model and profile over class proportions	$\ell(\beta, \pi_\ell, \pi_u, G_0)$ (Tian et al., 2023)
General inference with auxiliary predictions	Empirical likelihood stacking with external model moments	$\max_{w_k\ge0} \prod w_k$ , moment constraints
Distribution alignment (domain translation)	Entropic-OT-regularized likelihood matching paired/unpaired data	$L(\theta)$ as in (Persiianov et al., 2024)

Typical formulations impose minimal or no structural assumptions beyond convexity, regularity, or exponential family structure, with tuning parameters (e.g. $\gamma_1,\lambda_1,\lambda_2$ ) modulating the contribution of unlabeled data.

5. Theoretical Guarantees, Assumptions, and Limitations

Theoretical analysis of SSLEMLEs establishes:

Convexity and uniqueness: Estimators are unique and globally optimal when likelihoods are strictly concave or empirical likelihood constraints are full rank (Laria et al., 2020, Loog, 2015).
Monotonic improvement: For MCPL methods, the semi-supervised estimator strictly improves over the supervised solution with high probability in continuous covariate models (Loog, 2015).
Asymptotic efficiency: Covariance formulas are explicit in terms of Fisher information and auxiliary moment structure, with semiparametric bounds attainable if the moment family is sufficiently rich (Wang et al., 18 Dec 2025, Tian et al., 2023).
No cluster/manifold assumption: MCPL-based SSLEMLEs do not require explicit structure in unlabeled data; improvement is guaranteed under mild distributional regularity (Loog, 2015).
Label/unlabeled ratio dependence: Magnitude of efficiency gain scales inversely with the ratio of labeled to unlabeled data in regression (Balabdaoui et al., 27 Jan 2026).

6. Empirical Performance and Practical Considerations

Empirical studies consistently demonstrate SSLEMLE efficiency gains and enhanced predictive performance:

GLMs: s²net beats supervised elastic-net and other semi-supervised competitors in both regression and classification under simulated and multiple real data scenarios (Laria et al., 2020).
Regression (unmatched data): SSLEMLE outperforms OLS and matched-sample MLE on synthetic and real datasets (Combined Cycle Power Plant); >90% improvement in test MSE observed with moderate to large unlabeled ratio (Balabdaoui et al., 27 Jan 2026).
Classification under label-shift: Variance and bias improvements over supervised logistic regression scale with divergence of class proportions; the gain vanishes when label proportions match (Tian et al., 2023).
General prediction-powered inference: SSLEMLE enables shorter confidence intervals and lower MSE, attaining nominal coverage and calibration (Wang et al., 18 Dec 2025).
Distribution translation (domain adaptation): SSLEMLE with entropic-OT formulation recovers multimodal conditionals and substantially enhances test log-likelihood over baseline methods; target-side unpaired data is particularly impactful (Persiianov et al., 2024).
Algorithmic stability: FISTA-based and Newton–Raphson schemes converge to unique minima, projection and gradient steps are computationally tractable at scale, and empirical results match theoretical predictions.

7. Extensions and Specialized Constructions

Recent works extend SSLEMLE methodology to complex domains:

Auxiliary construction: Basis expansions for auxiliary moments $h$ or data-driven cross-fitting ensure flexibility and efficiency (Wang et al., 18 Dec 2025).
Domain translation/inverse optimal transport: Energy-based models and entropic regularization unify likelihood maximization and optimal transport, allowing parameterization via Gaussian mixtures and closed-form normalization (Persiianov et al., 2024).
Mixture models/EM approaches: Semi-supervised EM iterations accelerate convergence rates compared to unsupervised EM, quantified by explicit contraction bounds (Sula et al., 2022). Labeled data can rescue EM from slow convergence in weak-identifiability regimes.
Unmatched regression: SSLEMLE methodology applies even when the link between features and response is not paired, expanding the range of applicable datasets and facilitating deconvolution-based approaches (Balabdaoui et al., 27 Jan 2026).

These developments substantiate SSLEMLE as a unified, principled approach to semi-supervised estimation, encompassing supervised, unsupervised, and prediction-powered regimes, with concrete theoretical and empirical guarantees across diverse statistical and machine learning domains.