Semiparametric Full-Likelihood Estimators

Updated 17 January 2026

Semiparametric full-likelihood estimators are statistical methods that efficiently estimate finite-dimensional parameters in models with infinite-dimensional nuisance functions.
They employ profiling and sieve approximations, such as kernel smoothing and spline expansions, to manage nonparametric components while preserving efficiency.
These techniques are applied in high-dimensional data analysis, missing data problems, and complex dependencies, ensuring robust inference and bias control.

Semiparametric full-likelihood estimators are a class of statistical procedures that achieve efficient estimation of finite-dimensional parameters in models containing infinite-dimensional nuisance functions or nonparametric components, by leveraging the full likelihood structure and appropriate profiling or regularization strategies. These methods are rigorous extensions of classical likelihood techniques, designed to attain the semiparametric efficiency bound even when key aspects of the model, such as baseline densities or selection mechanisms, are modeled nonparametrically. They are fundamental in high-dimensional data analysis, missing data problems, density ratio models, regression with censored data, and complex dependencies where parametric assumptions for all model components are infeasible or undesirable.

1. Model Frameworks and Semiparametric Structure

Semiparametric full-likelihood estimators operate within models containing both finite-dimensional parameters of interest and infinite-dimensional nuisance functions. A canonical example is the semiparametric exponential family:

$f(x;\theta,g) = \exp\left\{ \theta^\top T(x) - A(\theta,g) + g(x) \right\}$

where $\theta \in \Theta \subset \mathbb{R}^d$ is the parameter of interest, $T(x)$ is a known vector of sufficient statistics, and $g(x)$ is a completely unspecified, nonparametric nuisance function. The partition function $A(\theta,g)$ ensures normalization. Identifiability is ensured by constraints such as $\int e^{g(x)} dx = 1$ and $A(\theta,g) < \infty$ for all $\theta$ in compact $\Theta$ (Lin et al., 2017).

Other generic models include proportional likelihood-ratio models (Goldberg et al., 2019), semiparametric mixture models with log-concave densities (Zhou et al., 2019), selection models for non-ignorable missing data (Liu et al., 2019), sieve likelihood models for censored and truncated data (Matthews et al., 10 Apr 2025), copula-based multivariate models (Medovikov et al., 2024), and regressions with flexible spline components (Michelot et al., 2013).

Across all these models, the goal is to estimate $\theta$ efficiently without over-constraining $g$ , while maintaining practical identifiability and robustness.

2. Construction of Semiparametric Full-Likelihood Estimators

The core methodology for semiparametric full-likelihood estimation involves four intertwined steps:

Writing the Full Likelihood: The joint log-likelihood $l(\theta,g)$ is written to include both $\theta$ and the full nonparametric $g$ , for all observed data.
Profiling or Sieve Approximation: The infinite-dimensional nuisance function $g$ is profiled out via the least favorable curve approach, or approximated using sieve expansions (e.g., B-splines, Bernstein polynomials, empirical likelihood atoms) (Lin et al., 2017, Guan, 2021, Michelot et al., 2013, Matthews et al., 10 Apr 2025). The least favorable curve $g_\theta$ satisfies the functional score zero condition, leading to explicit or numerically tractable representations:

$e^{g_\theta(y)} = \left[ E_\theta \left\{ \exp(\theta^\top T(X)) \mid X = y \right\} \right]^{-1}$
Maximizing the Profile Likelihood: The resulting profile log-likelihood $\ell_p(\theta) = l(\theta, g_\theta)$ is maximized with respect to $\theta$ , often via efficient gradient-based algorithms.
Efficient Score and Information: Because $g_\theta$ solves the functional profile equations, the efficient score for $\theta$ is simply the gradient $\partial_\theta \ell(\theta, g_\theta)$ evaluated at the observed data. The semiparametric Fisher information is $I_\mathrm{eff}(\theta) = \mathrm{Var}_\theta \{ S_\mathrm{eff}(\theta;X) \}$ (Lin et al., 2017, Goldberg et al., 2019, Matthews et al., 10 Apr 2025).

These steps generalize to the semiparametric proportional likelihood ratio (Goldberg et al., 2019), spline-sieve likelihoods (Matthews et al., 10 Apr 2025), empirical likelihood schemes in mixture and semicontinuous models (Yuan et al., 2020), and sieve copula estimators in multivariate models (Medovikov et al., 2024).

3. Algorithmic Implementation and Computational Aspects

Semiparametric full-likelihood algorithms routinely employ kernel or sieve smoothing for nonparametric components, EM-type iterations for mixtures or missing-data structures, and high-dimensional optimization routines. Key examples include:

Nadaraya–Watson estimator for $m(y;\theta) = E_\theta[\exp(\theta^\top T(X)) | X=y]$ and its normalization for computing $g_\theta(y)$ (Lin et al., 2017).
Spline sieves: Nonparametric functions approximated by $g(s) \approx \sum_{k} \gamma_k B_k(s)$ , jointly optimized alongside $\theta$ (Matthews et al., 10 Apr 2025, Michelot et al., 2013).
Bernstein polynomials: Used for robust approximation of baseline densities in two-sample DRM and copula models (Guan, 2021, Medovikov et al., 2024).
Empirical likelihood atomic masses: Discrete $\{p_{ij}\}$ formulated via Lagrangian constraints for multimodal or semicontinuous populations (Yuan et al., 2020).
Fractional imputation EM: For nonignorability and missingness, missing $Y$ values are multiply imputed, weighted by the nonparametric response mechanism, and the full likelihood is maximized alternately over $\phi$ and $g$ (Sang et al., 2018).
Full semiparametric likelihood for missing data: Empirical likelihood over the marginal $X$ distribution, alternating Lagrange-Multiplier and parameter updates (Liu et al., 2019).
Deep neural network integration: Profiling out the nonparametric baseline hazard in frailty models and training via back-propagation with penalized profile h-likelihood (Lee et al., 2023).

The complexity per likelihood evaluation is generally $O(n)$ per gradient step, and EM/sieve algorithms converge rapidly when the sieve dimension and kernel parameters are properly chosen.

4. Asymptotic Theory and Semiparametric Efficiency

Semiparametric full-likelihood estimators achieve $\sqrt{n}$ -consistency and asymptotic normality for $\theta$ under standard regularity conditions: compactness, identifiability, smoothness of $g$ , and bounded covariate distributions.

The canonical limit result:

$\sqrt{n} (\hat{\theta} - \theta_0) \xrightarrow{d} N(0,\,I_\mathrm{eff}(\theta_0)^{-1})$

is established for the profile likelihood in the semiparametric exponential family (Lin et al., 2017), Z-estimator constructions in likelihood ratio models (Goldberg et al., 2019), sieve likelihood estimators for censored/truncated regression (Matthews et al., 10 Apr 2025), and Bernstein–von Mises Bayes procedures under symmetric error (Chae, 2015). The sieve MLE attains the semiparametric bound for any functional $\lambda^\top \beta$ , and empirical likelihood approaches can be used to construct confidence intervals and perform hypothesis testing with $\chi^2$ limiting distributions for linear or smooth functionals (Yuan et al., 2020).

Comparative simulation studies demonstrate that these methods uniformly attain or outperform classical conditional-likelihood estimators, pseudo-likelihood approaches, and fully parametric MLEs under misspecification, particularly in high-dimensional, nonstandard, or data-missing settings (Lin et al., 2017, Goldberg et al., 2019, Liu et al., 2019, Medovikov et al., 2024, Yuan et al., 2020).

5. Applications, Robustness, and Empirical Performance

Semiparametric full-likelihood methods have demonstrated robustness and efficiency across a spectrum of data contexts, including:

Linear and generalized linear models with unknown error distribution or base measure: negligible bias, robust variance control, full efficiency even under model deviations (Lin et al., 2017, Matthews et al., 10 Apr 2025, Lee et al., 2022).
Handling non-ignorable missingness: Properly designed full-likelihood estimators with empirical likelihood over $X$ provide identifiability and efficiency without auxiliary IVs or restrictive assumptions (Liu et al., 2019, Sang et al., 2018).
Density-ratio and mixture models: Bernstein polynomial EM estimators yield smooth, boundary-adaptive densities and efficient inference of mixture proportions and component functionals (Guan, 2021, Yuan et al., 2020, Zhou et al., 2019).
Mark-recapture and survival models with flexible baseline probability structures: Penalized spline full-likelihood recovers nonlinear survival–covariate associations and enables data-driven smoothing parameter selection (Michelot et al., 2013, Matthews et al., 10 Apr 2025).
Multilevel and frailty models: Profiling-out nonparametric hazard functions combined with h-likelihood and deep networks enables scalable and unbiased inference for complex clustered event data (Lee et al., 2023).

Simulations and real-data analyses—e.g., wage studies with extensive missingness, insurance loss modeling, biological mixture identification, and multi-state disease progression—consistently demonstrate efficiency improvement, robust bias control, and proper coverage under the proposed full-likelihood semiparametric estimators (Lin et al., 2017, Liu et al., 2019, Medovikov et al., 2024, Yuan et al., 2020, Michelot et al., 2013).

6. Variants, Extensions, and Future Directions

Semiparametric full-likelihood techniques continue to expand into more complex modeling regimes, including:

High-dimensional settings and penalization: Adaptive selection of smoothing, knot dimension, or sieve basis enables extension to settings with many covariates or response classes (Michelot et al., 2013, Matthews et al., 10 Apr 2025).
Copula-based dependence modeling: Sieve MLE for unspecified copulas achieves marginal efficiency and avoids propensity for bias seen in parametric specifications (Medovikov et al., 2024).
Nonstandard data structures: Empirical likelihood and full-likelihood approaches extend naturally to interval-censored multi-state models, semicontinuous populations, and selection-biased sampling (Gu et al., 2022, Yuan et al., 2020).
Integration with machine learning: Deep neural network integration, as in semiparametric frailty models, enables flexible nonlinear modeling while retaining full-likelihood efficiency properties (Lee et al., 2023).

Continued methodological research is focused on theoretical properties under ever weaker assumptions, efficient algorithms for ultra-high dimension and big datasets, and inferential procedures under various forms of missingness, censoring, and measurement error.

7. Summary Table of Representative Models and Methodologies

Model/Context	Full-Likelihood Approach	Key Computational Strategy
Semiparametric exponential family	Profile likelihood via least favorable curve g↦g_θ	Smoothing, closed-form profiling
Proportional likelihood ratio	Projection onto tangent space, Z-estimators	Neumann series, IPW weighting
Sieve-likelihood for truncated/censored data	Spline basis expansion of baseline hazard	Newton–Raphson, spline sieve expansion
Multivariate copula models	Sieve MLE with Bernstein–Kantorovich copula	Constrained optimization, simplex parameterization
Semiparametric mixture/log-concave models	EM with nonparametric log-concave MLE	Active-set, EM iterations
Empirical likelihood for semicontinuous models	EL with dual profile likelihood for functionals	Lagrangian, atomic masses, Newton–Raphson
Frailty models/deep learning	Negative profiled h-likelihood with profiled-out baseline	Back-propagation, normalization, alternating minimization