Papers
Topics
Authors
Recent
Search
2000 character limit reached

Semi-Supervised Learning Empirical Likelihood Estimator

Updated 29 January 2026
  • The paper introduces SSLEMLE, which integrates labeled and unlabeled data via empirical likelihood to improve consistency, efficiency, and robustness.
  • It details multiple frameworks, including penalized GLMs and mixture models, that demonstrate significant performance gains over supervised-only methods.
  • The estimator offers rigorous theoretical guarantees, such as asymptotic normality and semiparametric efficiency, with practical algorithms ensuring faster convergence.

A Semi-Supervised Learning Empirical Maximum Likelihood Estimator (SSLEMLE) is any estimator that leverages the empirical likelihood or maximum likelihood principle to incorporate both labeled and unlabeled data for statistical estimation or prediction. SSLEMLEs have emerged as a central tool in semi-supervised inference, encompassing both parametric and nonparametric models. Key methods span empirical likelihood stacking (Wang et al., 18 Dec 2025), penalized GLM frameworks (Laria et al., 2020), exponential-tilt mixture models (Tian et al., 2023), contrastive-pessimistic approaches (Loog, 2015), and extensions to unmatched data regression (Balabdaoui et al., 27 Jan 2026). The common goal is to improve estimation efficiency, robustness, and inference quality compared to supervised-only methods, typically with guarantees on consistency, variance, and asymptotic optimality.

1. Core Principles and Mathematical Formulation

SSLEMLE approaches construct loss functions or likelihoods that explicitly blend labeled and unlabeled observations, often via moment constraints, weighted likelihood ratios, or pseudo-likelihood terms. In its archetypal empirical likelihood form (Wang et al., 18 Dec 2025), SSLEMLE solves: max{wk} k=1n+Nwk,subject towk0, kwk=1, kwkuk(θ)=0\max_{\{w_k\}}\ \prod_{k=1}^{n+N}w_k,\quad \text{subject to}\quad w_k\ge0,\ \sum_k w_k=1,\ \sum_k w_k\,u_k(\theta)=0 with composite moment functions

uk(θ)={g(Xk,Yk;θ),kn hc(Xk,m(Xk)),k>nu_k(\theta) = \begin{cases} g(X_k,Y_k;\theta), & k \le n \ h^c(X_k, m(X_k)), & k > n \end{cases}

where gg is a supervised score, hch^c centers auxiliary moments built from model predictions m(x)m(x), and wkw_k are empirical likelihood weights. The estimator θ^=argmaxθ(θ)\hat\theta = \arg\max_\theta \ell(\theta) solves the optimal profile likelihood over parameter space.

Other frameworks utilize penalized maximum likelihood for GLMs with surrogate terms on transformed unlabeled features, e.g. s²net (Laria et al., 2020): L(β)=R(yX;β)+γ1R(yˉ1T;β)+λ1β1+λ22β22\mathcal L(\beta) = R(y_\ell|X_\ell;\beta) + \gamma_1 R(\bar y_\ell \mathbf{1}|T; \beta) + \lambda_1\|\beta\|_1 + \tfrac{\lambda_2}{2}\|\beta\|_2^2 Here, γ1\gamma_1 modulates the influence of unlabeled-pseudo samples, and TT encodes geometric structure extracted from XuX_u via SVD and centering.

For mixture models and classification, SSLEMLEs can target maximization over the joint labeled-unlabeled likelihood under exponential-tilt constraints, nonparametric support, and label-shift adjustment (Tian et al., 2023): (β,π,πu,G0)= (labeled)+(unlabeled mixture)+profile over G0\ell(\beta, \pi_\ell, \pi_u, G_0) = \ \text{(labeled)} + \text{(unlabeled mixture)} + \text{profile over }G_0 The MCPL estimator for likelihood-based classifiers contrasts improvement over supervised solutions and takes a pessimistic (worst-case soft labeling) to guarantee log-likelihood monotonicity (Loog, 2015).

2. Optimization Algorithms and Implementation

Optimization of SSLEMLEs varies depending on problem structure:

  • Empirical likelihood stacking: Profile likelihood solved via Lagrange multipliers λ\lambda; closed-form weights wk(θ)=1n+N[1+λ(θ)uk(θ)]1w_k(\theta) = \frac{1}{n+N}[1+\lambda(\theta)^\top u_k(\theta)]^{-1}, with λ(θ)\lambda(\theta) solving the stacked moment equations (Wang et al., 18 Dec 2025).
  • Penalized GLM (s²net): Global convex minimization using FISTA, incorporating both soft-thresholding for sparsity (ℓ₁) and ridge scaling (ℓ₂), with gradient and quadratic surrogate backtracking (Laria et al., 2020). Updates have explicit forms amenable to efficient iterative solution.
  • Exponential-tilt mixture models: Saddle-point profile likelihood in (β,π,πu,α)(\beta, \pi_\ell, \pi_u, \alpha), alternating between E-step estimates for nonparametric masses and Newton–Raphson/M-step for parameters (Tian et al., 2023).
  • Contrastive-Pessimistic (MCPL): Alternating maximization over θ\theta and worst-case minimization over soft-labels qkiq_{ki}; projected gradient methods for the qq-updates, closed-form θ updates in exponential-family models (Loog, 2015).
  • Unmatched regression: Direct maximization of aggregated likelihoods via line-search/Newton steps with gradient and Hessian easily computable from component summations (Balabdaoui et al., 27 Jan 2026).

3. Statistical Properties and Efficiency Bounds

SSLEMLEs uniformly provide rigorous statistical guarantees:

  • Consistency and asymptotic normality: For empirical likelihood stacking and unmatched MLEs, estimates satisfy central limit theorem type distributions, e.g. $\sqrt{n}(\hat\theta-\theta^*)\dto \mathcal N(0,V_h)$ (Wang et al., 18 Dec 2025) and n+m(β^n,mβ0)N(0,ΣSSL)\sqrt{n+m}(\hat\beta_{n,m}-\beta_0)\to N(0, \Sigma_{\rm SSL}) (Balabdaoui et al., 27 Jan 2026).
  • Semiparametric efficiency: The empirical likelihood estimator attains the efficiency bound if auxiliary moments hh span the “predictable” score component (Wang et al., 18 Dec 2025). In mixture models, the semi-supervised estimator is strictly more efficient than supervised logistic under label-shift (Tian et al., 2023).
  • Guarantees on monotonicity: MCPL estimators guarantee that the semi-supervised solution is never worse, and often strictly better, in labeled log-likelihood than the supervised estimator, independent of cluster assumptions (Loog, 2015).
  • Finite-sample volume gain: In unmatched regression, explicit volume-ratio GG formulas quantify the improvement over classical MLE, scaling as O(1/λ)O(1/\sqrt{\lambda}) in the large-unlabeled regime (Balabdaoui et al., 27 Jan 2026).

4. Paradigms in Application Domains

The SSLEMLE framework is applicable across numerous statistical learning scenarios:

Domain Typical SSLEMLE Objective Representative Formulation
Regression (GLM, linear, unmatched) Penalized loss plus pseudo-likelihood, or mixture likelihood for unmatched data R(yX;β)+γ1R(yˉ1T;β)+λ1β1+λ2β22R(y_\ell|X_\ell;\beta) + \gamma_1 R(\bar y_\ell \mathbf{1}|T;\beta) + \lambda_1\|\beta\|_1 + \lambda_2\|\beta\|_2^2 (Laria et al., 2020); n,m(β)\ell_{n,m}(\beta) (Balabdaoui et al., 27 Jan 2026)
Classification (mixture, exponential-tilt) Stacked likelihood for labeled/unlabeled, with explicit mixture model and profile over class proportions (β,π,πu,G0)\ell(\beta, \pi_\ell, \pi_u, G_0) (Tian et al., 2023)
General inference with auxiliary predictions Empirical likelihood stacking with external model moments maxwk0wk\max_{w_k\ge0} \prod w_k, moment constraints
Distribution alignment (domain translation) Entropic-OT-regularized likelihood matching paired/unpaired data L(θ)L(\theta) as in (Persiianov et al., 2024)

Typical formulations impose minimal or no structural assumptions beyond convexity, regularity, or exponential family structure, with tuning parameters (e.g. γ1,λ1,λ2\gamma_1,\lambda_1,\lambda_2) modulating the contribution of unlabeled data.

5. Theoretical Guarantees, Assumptions, and Limitations

Theoretical analysis of SSLEMLEs establishes:

  • Convexity and uniqueness: Estimators are unique and globally optimal when likelihoods are strictly concave or empirical likelihood constraints are full rank (Laria et al., 2020, Loog, 2015).
  • Monotonic improvement: For MCPL methods, the semi-supervised estimator strictly improves over the supervised solution with high probability in continuous covariate models (Loog, 2015).
  • Asymptotic efficiency: Covariance formulas are explicit in terms of Fisher information and auxiliary moment structure, with semiparametric bounds attainable if the moment family is sufficiently rich (Wang et al., 18 Dec 2025, Tian et al., 2023).
  • No cluster/manifold assumption: MCPL-based SSLEMLEs do not require explicit structure in unlabeled data; improvement is guaranteed under mild distributional regularity (Loog, 2015).
  • Label/unlabeled ratio dependence: Magnitude of efficiency gain scales inversely with the ratio of labeled to unlabeled data in regression (Balabdaoui et al., 27 Jan 2026).

6. Empirical Performance and Practical Considerations

Empirical studies consistently demonstrate SSLEMLE efficiency gains and enhanced predictive performance:

  • GLMs: s²net beats supervised elastic-net and other semi-supervised competitors in both regression and classification under simulated and multiple real data scenarios (Laria et al., 2020).
  • Regression (unmatched data): SSLEMLE outperforms OLS and matched-sample MLE on synthetic and real datasets (Combined Cycle Power Plant); >90% improvement in test MSE observed with moderate to large unlabeled ratio (Balabdaoui et al., 27 Jan 2026).
  • Classification under label-shift: Variance and bias improvements over supervised logistic regression scale with divergence of class proportions; the gain vanishes when label proportions match (Tian et al., 2023).
  • General prediction-powered inference: SSLEMLE enables shorter confidence intervals and lower MSE, attaining nominal coverage and calibration (Wang et al., 18 Dec 2025).
  • Distribution translation (domain adaptation): SSLEMLE with entropic-OT formulation recovers multimodal conditionals and substantially enhances test log-likelihood over baseline methods; target-side unpaired data is particularly impactful (Persiianov et al., 2024).
  • Algorithmic stability: FISTA-based and Newton–Raphson schemes converge to unique minima, projection and gradient steps are computationally tractable at scale, and empirical results match theoretical predictions.

7. Extensions and Specialized Constructions

Recent works extend SSLEMLE methodology to complex domains:

  • Auxiliary construction: Basis expansions for auxiliary moments hh or data-driven cross-fitting ensure flexibility and efficiency (Wang et al., 18 Dec 2025).
  • Domain translation/inverse optimal transport: Energy-based models and entropic regularization unify likelihood maximization and optimal transport, allowing parameterization via Gaussian mixtures and closed-form normalization (Persiianov et al., 2024).
  • Mixture models/EM approaches: Semi-supervised EM iterations accelerate convergence rates compared to unsupervised EM, quantified by explicit contraction bounds (Sula et al., 2022). Labeled data can rescue EM from slow convergence in weak-identifiability regimes.
  • Unmatched regression: SSLEMLE methodology applies even when the link between features and response is not paired, expanding the range of applicable datasets and facilitating deconvolution-based approaches (Balabdaoui et al., 27 Jan 2026).

These developments substantiate SSLEMLE as a unified, principled approach to semi-supervised estimation, encompassing supervised, unsupervised, and prediction-powered regimes, with concrete theoretical and empirical guarantees across diverse statistical and machine learning domains.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Semi-Supervised Learning Empirical Maximum Likelihood Estimator (SSLEMLE).