Doubly Robust Estimator in Causal Inference
- Doubly robust estimator is a semiparametric method that uses both outcome regression and propensity score models to ensure unbiased estimation if either is correctly specified.
- It applies advanced techniques like sample-splitting and martingale central limit theory to address challenges in adaptive experiments and dependent data.
- Empirical results show that adaptive DR estimators improve finite-sample stability and efficiency through regularization and cross-fitting methods.
A doubly robust (DR) estimator is a semiparametric technique that achieves consistency for a target parameter when either of two working models (“nuisance parameters”) is correctly specified, but not necessarily both. In modern causal inference, DR estimators are central tools for unbiased policy evaluation, semiparametric estimation, and inference in settings where the outcome and treatment assignment mechanisms may be complex or adaptively learned. The construction and analysis of DR estimators is deeply entwined with efficient influence-function theory, sample-splitting, and orthogonality of estimating equations. Current research explores their application to adaptive experiments, high-dimensional models, off-policy evaluation in bandits, and complex observational designs.
1. Classical Doubly Robust Construction and Properties
The canonical DR estimator is formulated in the context of estimating a functional of the form
where is an evaluation policy, is a logging policy (behavior), and is an outcome regression.
The classical DR estimator is
It is termed “doubly robust” because consistency for holds if either the outcome regression or the propensity model is correctly specified; if both are, asymptotic efficiency is attained. The estimator is unbiased under either scenario:
- If is correct, the weighted residual term vanishes in expectation due to the matching of and .
- If is correct, plugging into the regression ensures unbiasedness for .
2. Adaptive Experiments and Dependent Data: The ADR Estimator
In adaptive experiments, the logging policy evolves with time, based on past data. Here, observations are dependent, and classical i.i.d.-based DR theory does not directly apply. The Adaptive Doubly Robust (ADR) estimator addresses these challenges by adopting a time-indexed sample-splitting approach:
- At time , fit nuisance functions and using history .
- Compute the pseudo-outcome
- Aggregate to form
Martingale central limit theory and semiparametric arguments show that, under uniform overlap, consistent nuisance estimation rates, and regularity, the estimator is asymptotically normal:
where is the influence function variance. Sample-splitting over time mitigates violation of Donsker conditions by modern learners, establishing valid inference even with dependent, adaptively-generated samples (Kato et al., 2020).
3. Efficiency, Rate Conditions, and Martingale Theory
DR estimators achieve semiparametric efficiency provided both outcome regression and propensity components are consistently estimated. In the ADR construction, bias decomposition shows that nuisance estimation errors interact multiplicatively; under bounded outcomes and uniform overlap, the product of errors in and must satisfy:
so that bias vanishes asymptotically. This “product-rate” principle holds for i.i.d. and dependent samples alike and licenses the use of machine learning algorithms with slower than parametric rates, provided cross-fitting or sample-splitting is employed.
The influence function for the ADR estimator is a martingale increment:
which permits standard variance computation and confidence interval construction.
4. Finite-Sample Phenomena and the Logging-Policy Paradox
Simulation studies demonstrate that the ADR estimator using an estimated logging policy often exhibits substantially lower mean squared error (MSE) than “AIPW” estimators employing the true policy . This arises because early in adaptive experiments, the true can be near 0 or 1 for certain actions, causing instability in IPW weights. Regularized or truncated estimates remain bounded, reducing variance. Asymptotic efficiency theory does not resolve this paradox; the limiting variances of ADR and AIPW are identical, but ADR is more stable in finite samples. This phenomenon mirrors earlier findings for propensity-score estimation in i.i.d. designs but is distinctively acute for dependent, adaptively-collected data (Kato et al., 2020).
5. Implementation and Practical Recommendations
DR estimation requires careful selection of nuisance learners with known error rates. For logging policy models, logistic regression or gradient-boosted trees are commonly used, with regularization or truncation to maintain overlap. For outcome regressions, random forests, neural networks, or kernel methods are suitable. Interval estimation employs plug-in influence-function variance:
yielding intervals.
Key conditions:
- Logging probabilities must stabilize (i.e., ).
- The product of nuisance estimation rates must satisfy .
Sample-splitting, cross-fitting, and regularization are central to robust practical performance.
6. Extensions and Connections to Modern Theory
The doubly robust paradigm recurs throughout modern semiparametric theory, connecting to orthogonal estimating equations, efficient score construction, and modern sample-splitting/cross-fitting analysis [Chernozhukov et al., 2018]. ADR estimators have close analogues in double/debiased machine learning, adaptive causal inference, and off-policy evaluation. The techniques extend naturally to high-dimensional covariate settings, adaptive bandit models, policy evaluation under dependence, and model selection environments.
7. Summary Table: Classical DR vs. ADR in Adaptive Experiments
| Estimator | Setting | Asymptotic Normality | Finite Sample Stability | Learners for Nuisances |
|---|---|---|---|---|
| Classical DR | i.i.d. | standard CLT | unstable if near 0 or 1 | logistic regression, trees, forests |
| ADR | adaptive/dependent | martingale CLT with sample-splitting | stable via , regularization | same, requires adaptive-fitting |
ADR estimators unify the theory of DR inference in adaptive and dependent data settings and empirically outperform classical AIPW approaches under time-varying policies, demonstrating robustness, efficiency, and finite-sample advantages (Kato et al., 2020).