- The paper introduces a penalized GMM estimator that automatically debiases nonparametric IV estimates by estimating the Riesz representer.
- It achieves valid inference for both linear and nonlinear functionals using cross-fitting and regularized high-dimensional techniques.
- Empirical results highlight improved numerical stability and more accurate confidence intervals, particularly in high-dimensional, ill-posed settings.
Penalized GMM Inference for Functionals of Nonparametric IV Estimators
Introduction and Motivation
This paper introduces a penalized GMM (PGMM) approach for inference on general functionals of nonparametric instrumental variable (NPIV) estimators, focusing on both linear and nonlinear functionals in high-dimensional and ill-posed settings with endogeneity. The main technical contribution is a PGMM estimator for the Riesz representer required for automatic debiasing of regularized ML-based NPIV estimators. The framework is motivated by the inadequacy of plug-in MLIV approaches for valid inference due to first-order regularization bias, and by the increasing prominence of ML instrumental variable (MLIV) estimators, which balance flexibility against ill-posedness through regularization and model selection.
The proposed PGMM-based automatic debiasing mechanism is structurally insensitive to the form of the functional considered, generalizes the Lasso minimum-distance Riesz estimator, and is shown to be the only non-minimax automatic RR estimator in the NPIV context.
Statistical Framework
The canonical NPIV model is
Y=γ0​(X)+ε,E[ε∣Z]=0,
where X may be high-dimensional and endogenous, Z are instruments, and the object of interest is a functional θ0​=E[m(W,γ0​)]. The ill-posed nature of the problem is highlighted: estimation of γ0​ amounts to inversion of a compact linear operator, which is highly sensitive to errors, hence regularization is inevitable in practice.
When the inferential target is a functional, the regularization bias propagates nontrivially: plug-in estimators for θ0​ constructed from MLIV γ^​ display severe coverage distortions, since the regularization bias dominates the sampling variability even as n→∞.
The Neyman-orthogonal influence function-based approach is adopted: the construction
ψ(W,θ,γ,α)=m(W,γ)−θ+α(Z)[Y−γ(X)]
eliminates first-order bias in θ^ by inclusion of the suitable Riesz representer X0. This representation ensures local robustness (Neyman orthogonality) and admits root-X1 inference provided appropriate estimation rates for both X2 and the Riesz representer X3.
Penalized GMM for Automatic Riesz Representer Estimation
In contrast to parametric and analytic regularization approaches (e.g., explicit Lasso RR, neural nets, minimax strategies), the PGMM estimator for X4 is constructed by directly exploiting the population orthogonality implied by the linearity of X5: X6
Discretizing X7 with a basis X8 and approximating X9 within a high-dimensional basis Z0, the estimation is posed as a high-dimensional, over-parameterized GMM moment problem with an Z1 penalty: Z2
yielding Z3. The procedure accommodates Z4, with identification secured via a high-dimensional restricted eigenvalue condition. The orthogonality of the influence function motivates this construction and guarantees that the estimator does not require an analytical form for the RR—even for complex nonlinear or ill-posed problems.
Asymptotic Theory
Linear Functionals
Root-Z5 asymptotic normality holds for the debiased estimator Z6 if
- Z7 converges in projected Z8 norm at rate Z9 (can be slow due to ill-posedness),
- θ0​=E[m(W,γ0​)]0 converges in θ0​=E[m(W,γ0​)]1 at rate θ0​=E[m(W,γ0​)]2 (with θ0​=E[m(W,γ0​)]3 the RR's effective sparsity and suitable regularization),
- θ0​=E[m(W,γ0​)]4,
- regular moment and design conditions are satisfied.
These conditions are highly permissive: projected mean square convergence for θ0​=E[m(W,γ0​)]5 is attainable for a wide range of MLIV (Double Lasso, kernel IV, minimax, etc.), and the estimation error in the RR does not inflate sampling error beyond the orthogonalization margin.
Nonlinear Functionals
The extension to nonlinear functionals (e.g., average consumer surplus, own-price elasticity) requires Gateaux/Frechet differentiability of θ0​=E[m(W,γ0​)]6, restricts attention to functionals for which the Riesz representation remains well-defined and linear in the perturbation direction, and, due to the lack of direct orthogonality, requires stronger convergence of θ0​=E[m(W,γ0​)]7---fast enough in standard θ0​=E[m(W,γ0​)]8 norm to ensure θ0​=E[m(W,γ0​)]9, typically necessitating γ0​0.
All inference is performed via sample-splitting and cross-fitting—in particular, for nonlinear functionals, double cross-fitting is required to prevent overfitting in estimation of the RR moment system, since those moments depend on estimated γ0​1.
Empirical and Simulation Results
Multiple Monte Carlo experiments are provided, focusing primarily on the weighted average derivative and own-price elasticity functionals in ill-posed IV systems with moderate to high dimensions.
- Key results: Plug-in estimators display severe undercoverage—empirical coverage of nominal 95% confidence intervals collapses quickly with increasing γ0​2, often to below 5%. Coverage failure is pronounced for functional targets—even at moderate sample sizes—and bias is non-negligible.
- Debiased estimators (both analytical and PGMM): Achieve near-nominal coverage (90%--96%) across all regimes, with stable bias and variance.
- Numerical stability: The automatic PGMM debiasing procedure exhibits stronger numerical stability and lower variance than analytical RR-based approaches, especially in small-γ0​3/high-γ0​4 regimes. This is attributed to the avoidance of analytic RR matrix inversions and improved conditioning.
In the context of semiparametric demand estimation for differentiated products using IRI scanner data, semiparametric (automatic debiased) own-price elasticities are approximately 20% more elastic (in magnitude) relative to parametric logit demand estimates. Importantly, the magnitude and sign of the debiasing correction is heterogenous across products—ranging from negligible for some SKUs (Stock Keeping Units) to multiples of the analytical standard error for others. This differential effect highlights the practical importance of automatic debiasing for valid inference in empirical IO.
Algorithmic and Computational Aspects
The PGMM optimization leverages coordinate descent with active set and adaptive/diagonal penalty loading variants. Cross-validated selection of the penalty parameter is adopted for stability in finite samples. Full details of efficient high-dimensional implementation, including active-set exploitation for computational gains, are developed and empirically benchmarked.
Theoretical and Practical Implications, Directions for Future Research
By integrating Neyman-orthogonal machinery, automatic RR estimation, penalized GMM, and modern MLIV estimators, this work provides a robust, theoretically justified, and computationally scalable framework for valid inference on functionals of nonparametric models under ill-posedness and endogeneity.
Theoretical implications:
- This approach removes the obstacle of bias correction for functionals when explicit RR formulas are unavailable, thus generalizing debiased machine learning to the challenging NPIV/MLIV context.
- Rates derived clarify the differing requirements for linear vs. nonlinear functionals, and point to the slowest admissible convergence for valid inference.
Practical implications:
- In high-dimensional structural estimation (demand/IO, policy evaluation), using the proposed framework is essential for valid confidence intervals.
- Heterogeneous debiasing corrections at the functional level indicate that failing to debias can result in severe misestimation of policy-relevant objects—especially in applied work that relies on plug-in ML methods.
- The open-source implementation makes the approach easily applicable to empirical problems in modern econometric practice.
Speculation on future research:
- Extension to irregular functionals and sup-norm inference.
- Relaxing convergence constraints for nonlinear functional inference in ill-posed problems.
- Seamless integration with even more sophisticated ML base-learners (e.g., deep nets, ensemble methods) within the RR estimation framework.
Conclusion
The penalized GMM framework for automatic debiasing of functionals of nonparametric IV estimators enables asymptotically valid, robust inference when modern regularized machine learning approaches are employed for high-dimensional, ill-posed problems. The method is algorithmically practical, statistically optimal under minimal conditions, and gives empirical evidence supporting its necessity over conventional plug-in alternatives in both synthetic and real economic data (2603.29889).