Empirical Likelihood Estimation under CMR
- The paper introduces a semiparametric framework that casts estimation as an infinite-dimensional empirical likelihood problem to achieve the efficiency bound.
- It employs RKHS, neural networking, and sieve-based approximations to rigorously enforce conditional moment restrictions.
- Empirical studies demonstrate that these methods outperform traditional estimators by significantly reducing mean squared error.
Empirical-likelihood (EL) estimators under conditional moment restrictions (CMR) comprise a foundational framework for inference in semiparametric econometrics, statistical machine learning, and causal inference. The central insight is that, in models identified by conditional moments, estimation can be cast as an infinite-dimensional empirical-likelihood problem, leading to procedures that achieve efficiency bounds, enjoy robust small-sample properties, and take advantage of function-approximation frameworks such as reproducing kernel Hilbert spaces (RKHS) and neural networks. Below, the mathematical setup, principal methodologies, asymptotic properties, and implementation details are developed, referencing central developments (Kremer et al., 2022, Chaumaray et al., 2020, Kremer et al., 2023, Chib et al., 2021).
1. Statistical Formulation and Conditional Moment Restrictions
Suppose are i.i.d. draws from an unknown law , is the finite-dimensional parameter of interest, and is a prescribed moment function. The conditional moment restriction stipulates: for a unique . This model class generalizes classical mean regression and instrumental variable settings, incorporating nonparametric or semiparametric nuisance components as needed (Kremer et al., 2022, Chib et al., 2021).
A key equivalence, via the law of iterated expectations, is: yielding an infinite system of unconditional moment restrictions, indexed by test functions . The solution set can be abstractly represented as vanishing of a functional in the dual of a Hilbert space : (Kremer et al., 2022, Kremer et al., 2023).
2. Functional Generalized Empirical Likelihood Framework
Generalized empirical likelihood (GEL) seeks an alternative probability measure that (i) strictly enforces the continuum of moment constraints and (ii) incurs minimal divergence from the empirical measure. For a convex function generating the divergence , one solves: For the original empirical likelihood, , the primal problem is: where the constraints span an infinite-dimensional space (Kremer et al., 2022).
The dual emerges by introducing Lagrange multipliers : Possibly with an RKHS-norm or regularizer on . For general , the dual takes the form: where is the convex conjugate, and for EL (Kremer et al., 2022).
3. Asymptotic Properties and Efficiency
Under compactness of , continuity of , non-singularity of
and uniform Donsker conditions on the class , one has:
- Consistency:
with at rate , .
- Asymptotic normality:
coincident with the semiparametric efficiency bound of Chamberlain (1987) (Kremer et al., 2022, Kremer et al., 2023, Chib et al., 2021).
In settings where sieve-based or kernel-based approximations are used, the correct growth rate of the sieve dimension (e.g., under correct specification) is necessary to guarantee efficiency (Chib et al., 2021).
4. Solution Strategies and Computation
- RKHS-based implementation:
Let be the RKHS of a universal, strictly positive-definite kernel on . By the representer theorem, the maximizer has the form:
Reducing the infinite-dimensional optimization over to a finite problem in . Algorithmic steps include alternating or simultaneous maximization over and minimization over (using, e.g., LBFGS), leveraging Danskin's theorem for gradient computations (Kremer et al., 2022, Kremer et al., 2023).
- Neural network-based implementation:
Parametrize the dual function by a feed-forward neural network. The EL criterion becomes:
Training employs stochastic min-max solvers suited for nonconvex-concave games (e.g., Optimistic Adam) (Kremer et al., 2022).
- Sieve-based and ETEL approach:
Approximate the CMR via finite sieves of basis functions , expanding unconditional moments as . Optimization proceeds via Newton or quasi-Newton solvers in the inner loop (for dual parameters) and standard optimizers in the outer loop (for ) (Chib et al., 2021).
5. Key Variants and Theoretical Extensions
- Kernel Method of Moments (KMM):
KMM replaces the divergence penalty in the GEL functional by a maximum mean discrepancy (MMD) between a candidate law and the empirical law, together with an entropy regularization term. This allows candidate distributions to place mass "off" the empirical data, yielding:
$R_\epsilon^\varphi(\theta) = \inf_{P \ll \omega} \frac{1}{2}\MMD^2(P, \hat{P}_n; \mathcal{F}) + \epsilon D_\varphi(P \|\omega) \quad \text{s.t. moments as above}$
Dual representations, representer-theorem reductions, and practical stochastic gradient algorithms are employed. KMM achieves semiparametric efficiency and offers flexibility beyond data reweighting approaches (Kremer et al., 2023).
- Dependent Data and Semiparametric Models:
In stationary -mixing settings (e.g., time series, partially linear models), EL-based inference incorporates nonparametric estimates of nuisance functions via kernel smoothing, with Wilks' theorem holding under appropriate mixing-rate conditions (Chaumaray et al., 2020).
6. Empirical Performance and Applications
Canonical experiments demonstrate the utility of EL and GEL in CMR problems:
- Heteroskedastic linear regression:
Both kernel-based and neural-FGEL methods achieve the lowest MSE in across sample sizes, outperforming traditional 2-step GMM and recent variational-moment estimators (Kremer et al., 2022, Kremer et al., 2023).
- Instrumental-variable regression:
FGEL (kernel/neural) and KMM methods consistently yield lower test-MSEs compared to least squares, SMD, kernel/Neural VMM, and DeepIV, in both parametric and nonparametric settings (Kremer et al., 2022, Kremer et al., 2023).
7. Comparative Properties and Extensions
Empirical-likelihood estimators under CMR combine semiparametric efficiency, optimization flexibility, and accommodation of infinite unconditional restriction sets via RKHS, sieves, or neural parameterizations. They contrast with GMM, which is strictly limited to unconditional restrictions, and (kernelized) variational moment-matching, which may lack exact constraint satisfaction or efficiency properties without substantial regularization and basis-approximation (Kremer et al., 2022, Kremer et al., 2023, Chib et al., 2021).
Summary Table: Key Methodological Variants
| Variant | Constraint Enforcement | Candidate Measure |
|---|---|---|
| EL / GEL | Data reweighting () | Discrete (empirical) |
| KMM | MMD-based + entropy, moments | Law "off data" |
| ETEL/Sieve | Exp. tilting, sieve moments | Data reweighting |
| Neural-GEL | Network () | Flexible parametric |
Each method achieves the Chamberlain semiparametric efficiency bound for appropriately chosen function classes; selection of basis dimension, kernel, or network size is critical for practical performance (Kremer et al., 2022, Chib et al., 2021, Kremer et al., 2023).