Papers
Topics
Authors
Recent
Search
2000 character limit reached

Empirical Likelihood Estimation under CMR

Updated 12 January 2026
  • The paper introduces a semiparametric framework that casts estimation as an infinite-dimensional empirical likelihood problem to achieve the efficiency bound.
  • It employs RKHS, neural networking, and sieve-based approximations to rigorously enforce conditional moment restrictions.
  • Empirical studies demonstrate that these methods outperform traditional estimators by significantly reducing mean squared error.

Empirical-likelihood (EL) estimators under conditional moment restrictions (CMR) comprise a foundational framework for inference in semiparametric econometrics, statistical machine learning, and causal inference. The central insight is that, in models identified by conditional moments, estimation can be cast as an infinite-dimensional empirical-likelihood problem, leading to procedures that achieve efficiency bounds, enjoy robust small-sample properties, and take advantage of function-approximation frameworks such as reproducing kernel Hilbert spaces (RKHS) and neural networks. Below, the mathematical setup, principal methodologies, asymptotic properties, and implementation details are developed, referencing central developments (Kremer et al., 2022, Chaumaray et al., 2020, Kremer et al., 2023, Chib et al., 2021).

1. Statistical Formulation and Conditional Moment Restrictions

Suppose {(Xi,Zi)}i=1n\{(X_i,Z_i)\}_{i=1}^n are i.i.d. draws from an unknown law PX,ZP_{X,Z}, θΘRp\theta \in \Theta \subseteq \mathbb{R}^p is the finite-dimensional parameter of interest, and ψ:X×ΘRm\psi: \mathcal{X} \times \Theta \to \mathbb{R}^m is a prescribed moment function. The conditional moment restriction stipulates: E[ψ(X;θ0)Z]=0PZ-a.s.\mathbb{E}[\psi(X;\theta_0) \mid Z] = 0 \quad P_Z\text{-a.s.} for a unique θ0Θ\theta_0 \in \Theta. This model class generalizes classical mean regression and instrumental variable settings, incorporating nonparametric or semiparametric nuisance components as needed (Kremer et al., 2022, Chib et al., 2021).

A key equivalence, via the law of iterated expectations, is: E[w(Z)ψ(X;θ0)]=0,w:ZRm\mathbb{E}\left[w(Z)\psi(X;\theta_0)\right]=0, \quad \forall w:\mathcal{Z}\to\mathbb{R}^m yielding an infinite system of unconditional moment restrictions, indexed by test functions ww. The solution set can be abstractly represented as vanishing of a functional in the dual of a Hilbert space H\mathcal{H}: EP0[Ψ(X,Z;θ0)]=0,with Ψ(X,Z;θ)[h]=ψ(X;θ)h(Z),  hHE_{P_0}\left[\Psi(X,Z;\theta_0)\right] = 0, \quad \text{with } \Psi(X,Z;\theta)[h] = \psi(X;\theta)^{\top} h(Z),\;\forall h \in \mathcal{H} (Kremer et al., 2022, Kremer et al., 2023).

2. Functional Generalized Empirical Likelihood Framework

Generalized empirical likelihood (GEL) seeks an alternative probability measure PP^n=n1i=1nδ(Xi,Zi)P \ll \widehat{P}_n = n^{-1} \sum_{i=1}^n \delta_{(X_i,Z_i)} that (i) strictly enforces the continuum of moment constraints and (ii) incurs minimal divergence from the empirical measure. For a convex function φ\varphi generating the divergence DφD_\varphi, one solves: R(θ)=infPP^n{Dφ(PP^n)    EP[Ψ(X,Z;θ)]=0}R(\theta) = \inf_{P\ll \widehat{P}_n} \bigg\{ D_\varphi(P \Vert \widehat{P}_n) \;\Big|\; E_P[\Psi(X,Z;\theta)] = 0 \bigg\} For the original empirical likelihood, φ(p)=2logp\varphi(p) = -2\log p, the primal problem is: min{pi}:pi=1,pi0i=1n2log(npi)s.t.i=1npiψ(Xi;θ)h(Zi)=0  hH\min_{\{p_i\}:\sum p_i=1,\,p_i\geq 0} \sum_{i=1}^n -2\log(n p_i) \quad \text{s.t.} \quad \sum_{i=1}^n p_i\psi(X_i;\theta)^{\top} h(Z_i) = 0\;\forall h\in\mathcal{H} where the constraints span an infinite-dimensional space (Kremer et al., 2022).

The dual emerges by introducing Lagrange multipliers λ:ZRm\lambda:\mathcal{Z} \to \mathbb{R}^m: R(θ)=supλ(){1ni=1nlog(1+λ(Zi)ψ(Xi;θ))}R(\theta) = \sup_{\lambda(\cdot)} \left\{ -\frac{1}{n}\sum_{i=1}^n \log\left(1+\lambda(Z_i)^{\top}\psi(X_i;\theta)\right) \right\} Possibly with an RKHS-norm or L2L_2 regularizer on λ\lambda. For general φ\varphi, the dual takes the form: suphH{1ni=1nφ(Ψi(θ)[h])λhH}\sup_{h\in\mathcal{H}} \left\{ -\frac{1}{n} \sum_{i=1}^n \varphi^*(\Psi_i(\theta)[h]) - \lambda \|h\|_{\mathcal{H}} \right\} where φ\varphi^* is the convex conjugate, and for EL ϕ(v)=log(1v)\phi(v) = \log(1-v) (Kremer et al., 2022).

3. Asymptotic Properties and Efficiency

Under compactness of Θ\Theta, continuity of ψ\psi, non-singularity of

Ω0=E[ΨΨ],Σ0=E[θΨ],E[θΨ]H\Omega_0 = E[\Psi \otimes \Psi],\quad \Sigma_0 = \langle E[\nabla_\theta \Psi],\,E[\nabla_\theta \Psi]\rangle_{\mathcal{H}^*}

and uniform Donsker conditions on the class Ψ(;θ)(h)\Psi(\cdot;\theta)(h), one has:

  • Consistency:

θ^=argminθsuphH{1niϕ(ψ(Xi;θ)h(Zi))λn2hH2}pθ0\hat{\theta} = \arg\min_\theta \sup_{h\in\mathcal{H}} \left\{ \frac{1}{n}\sum_i \phi(\psi(X_i;\theta)^\top h(Z_i)) - \frac{\lambda_n}{2}\|h\|_{\mathcal{H}}^2 \right\} \xrightarrow{p} \theta_0

with λn0\lambda_n \to 0 at rate O(nξ)O(n^{-\xi}), ξ<1/2\xi < 1/2.

  • Asymptotic normality:

n(θ^θ0)dN(0,Σθ),Σθ=(θΨ0Ω01θΨ0)1\sqrt{n}(\hat{\theta}-\theta_0) \overset{d}{\longrightarrow} N(0,\Sigma_\theta),\quad \Sigma_\theta = \left(\nabla_\theta\Psi_0\,\Omega_0^{-1}\,\nabla_\theta\Psi_0^*\right)^{-1}

coincident with the semiparametric efficiency bound of Chamberlain (1987) (Kremer et al., 2022, Kremer et al., 2023, Chib et al., 2021).

In settings where sieve-based or kernel-based approximations are used, the correct growth rate of the sieve dimension (e.g., kn=o(n1/6)k_n = o(n^{1/6}) under correct specification) is necessary to guarantee efficiency (Chib et al., 2021).

4. Solution Strategies and Computation

  • RKHS-based implementation:

Let H\mathcal{H} be the RKHS of a universal, strictly positive-definite kernel kk on Z\mathcal{Z}. By the representer theorem, the maximizer hh^* has the form:

h(z)=j=1nαjk(zj,z)h(z) = \sum_{j=1}^n \alpha_j k(z_j, z)

Reducing the infinite-dimensional optimization over hh to a finite problem in αRn\alpha \in \mathbb{R}^n. Algorithmic steps include alternating or simultaneous maximization over α\alpha and minimization over θ\theta (using, e.g., LBFGS), leveraging Danskin's theorem for gradient computations (Kremer et al., 2022, Kremer et al., 2023).

  • Neural network-based implementation:

Parametrize the dual function λ(z)=hω(z)\lambda(z) = h_\omega(z) by a feed-forward neural network. The EL criterion becomes:

minθ  maxω  1ni=1nϕ(ψ(Xi;θ)hω(Zi))λn2ni=1nhω(Zi)2\min_{\theta}\;\max_{\omega}\; \frac{1}{n}\sum_{i=1}^n \phi(\psi(X_i;\theta)^\top h_\omega(Z_i)) - \frac{\lambda_n}{2n}\sum_{i=1}^n \|h_\omega(Z_i)\|^2

Training employs stochastic min-max solvers suited for nonconvex-concave games (e.g., Optimistic Adam) (Kremer et al., 2022).

  • Sieve-based and ETEL approach:

Approximate the CMR via finite sieves of basis functions {φj(z)}j=1kn\{\varphi_j(z)\}_{j=1}^{k_n}, expanding unconditional moments as gi(θ)=[φ1(Zi)ψ(Xi,θ),...,φkn(Zi)ψ(Xi,θ)]g_i(\theta) = [\varphi_1(Z_i)\psi(X_i,\theta),..., \varphi_{k_n}(Z_i)\psi(X_i,\theta)]'. Optimization proceeds via Newton or quasi-Newton solvers in the inner loop (for dual parameters) and standard optimizers in the outer loop (for θ\theta) (Chib et al., 2021).

5. Key Variants and Theoretical Extensions

  • Kernel Method of Moments (KMM):

KMM replaces the divergence penalty in the GEL functional by a maximum mean discrepancy (MMD) between a candidate law and the empirical law, together with an entropy regularization term. This allows candidate distributions to place mass "off" the empirical data, yielding:

$R_\epsilon^\varphi(\theta) = \inf_{P \ll \omega} \frac{1}{2}\MMD^2(P, \hat{P}_n; \mathcal{F}) + \epsilon D_\varphi(P \|\omega) \quad \text{s.t. moments as above}$

Dual representations, representer-theorem reductions, and practical stochastic gradient algorithms are employed. KMM achieves semiparametric efficiency and offers flexibility beyond data reweighting approaches (Kremer et al., 2023).

  • Dependent Data and Semiparametric Models:

In stationary α\alpha-mixing settings (e.g., time series, partially linear models), EL-based inference incorporates nonparametric estimates η^γ\widehat{\eta}_\gamma of nuisance functions via kernel smoothing, with Wilks' theorem holding under appropriate mixing-rate conditions (Chaumaray et al., 2020).

6. Empirical Performance and Applications

Canonical experiments demonstrate the utility of EL and GEL in CMR problems:

  • Heteroskedastic linear regression:

Both kernel-based and neural-FGEL methods achieve the lowest MSE in θ^\hat{\theta} across sample sizes, outperforming traditional 2-step GMM and recent variational-moment estimators (Kremer et al., 2022, Kremer et al., 2023).

  • Instrumental-variable regression:

FGEL (kernel/neural) and KMM methods consistently yield lower test-MSEs compared to least squares, SMD, kernel/Neural VMM, and DeepIV, in both parametric and nonparametric settings (Kremer et al., 2022, Kremer et al., 2023).

7. Comparative Properties and Extensions

Empirical-likelihood estimators under CMR combine semiparametric efficiency, optimization flexibility, and accommodation of infinite unconditional restriction sets via RKHS, sieves, or neural parameterizations. They contrast with GMM, which is strictly limited to unconditional restrictions, and (kernelized) variational moment-matching, which may lack exact constraint satisfaction or efficiency properties without substantial regularization and basis-approximation (Kremer et al., 2022, Kremer et al., 2023, Chib et al., 2021).

Summary Table: Key Methodological Variants

Variant Constraint Enforcement Candidate Measure
EL / GEL Data reweighting (PP^nP \ll \hat{P}_n) Discrete (empirical)
KMM MMD-based + entropy, moments Law "off data"
ETEL/Sieve Exp. tilting, sieve moments Data reweighting
Neural-GEL Network (λω(z)\lambda_\omega(z)) Flexible parametric

Each method achieves the Chamberlain semiparametric efficiency bound for appropriately chosen function classes; selection of basis dimension, kernel, or network size is critical for practical performance (Kremer et al., 2022, Chib et al., 2021, Kremer et al., 2023).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Empirical-Likelihood Estimator under Conditional Moment Restrictions.