Causal Inference with High-dimensional Discrete Covariates
Abstract: When estimating causal effects from observational studies, researchers often need to adjust for many covariates to deconfound the non-causal relationship between exposure and outcome, among which many covariates are discrete. The behavior of commonly used estimators in the presence of many discrete covariates is not well understood since their properties are often analyzed under structural assumptions including sparsity and smoothness, which do not apply in discrete settings. In this work, we study the estimation of causal effects in a model where the covariates required for confounding adjustment are discrete but high-dimensional, meaning the number of categories $d$ is comparable with or even larger than sample size $n$. Specifically, we show the mean squared error of commonly used regression, weighting and doubly robust estimators is bounded by $\frac{d2}{n2}+\frac{1}{n}$. We then prove the minimax lower bound for the average treatment effect is of order $\frac{d2}{n2 \log2 n}+\frac{1}{n}$, which characterizes the fundamental difficulty of causal effect estimation in the high-dimensional discrete setting, and shows the estimators mentioned above are rate-optimal up to log-factors. We further consider additional structures that can be exploited, namely effect homogeneity and prior knowledge of the covariate distribution, and propose new estimators that enjoy faster convergence rates of order $\frac{d}{n2} + \frac{1}{n}$, which achieve consistency in a broader regime. The results are illustrated empirically via simulation studies.
- Causal inference in high dimensions: a marriage between bayesian modeling and good frequentist properties. Biometrics, 78(1):100–114.
- Athey, S. (2018). The impact of machine learning on economics. In The economics of artificial intelligence: An agenda, pages 507–547. University of Chicago Press.
- Approximate residual balancing: debiased inference of average treatment effects in high dimensions. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(4):597–623.
- Hypothesis testing for densities and high-dimensional multinomials. The Annals of Statistics, 47(4):1893–1927.
- Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4):962–973.
- Program evaluation and causal inference with high-dimensional data. Econometrica, 85(1):233–298.
- Minimax semiparametric learning with approximate sparsity. arXiv preprint arXiv:1912.12213.
- Challenges of the inconsistency regime: Novel debiasing methods for missing data models. arXiv preprint arXiv:2309.01362.
- A general framework for treatment effect estimation in semi-supervised and high dimensional settings. arXiv preprint arXiv:2201.00468.
- Inference for individual mediation effects and interventional effects in sparse high-dimensional causal graphical models. arXiv preprint arXiv:1809.10652.
- Double/debiased machine learning for treatment and structural parameters.
- Majorization and the birthday inequality. Mathematics Magazine, 64(3):183–188.
- Moving the goalposts: Addressing limited overlap in the estimation of average treatment effects by changing the estimand.
- A probabilistic theory of pattern recognition, volume 31. Springer Science & Business Media.
- DÃaz, I. (2023). Non-agency interventions for causal mediation in the presence of intermediate confounding. Journal of the Royal Statistical Society Series B: Statistical Methodology, page qkad130.
- Moduli of smoothness, volume 9. Springer Science & Business Media.
- Causal inference for genomic data with multiple heterogeneous outcomes. arXiv preprint arXiv:2404.09119.
- Overlap in observational studies with high-dimensional covariates. Journal of Econometrics, 221(2):644–654.
- A joint mle approach to large-scale structured latent attribute analysis. Journal of the American Statistical Association, 118(541):746–760.
- Hahn, J. (1998). On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica, pages 315–331.
- Optimal rates of entropy estimation over lipschitz balls. Annals of Statistics, 48(6).
- On the phase transition of wilks’ phenomenon. Biometrika, 108(3):741–748.
- Causal inference.
- Efficient estimation of average treatment effects using the estimated propensity score. Econometrica, 71(4):1161–1189.
- Hoeffding, W. (1956). On the distribution of the number of successes in independent trials. The Annals of Mathematical Statistics, pages 713–721.
- Covariate balancing propensity score. Journal of the Royal Statistical Society Series B: Statistical Methodology, 76(1):243–263.
- A new central limit theorem for the augmented ipw estimator: Variance inflation, cross-fit covariance and beyond. arXiv preprint arXiv:2205.10198.
- Likelihood ratio tests for high-dimensional normal distributions. Scandinavian Journal of Statistics, 42(4):988–1009.
- Minimax estimation of functionals of discrete distributions. IEEE Transactions on Information Theory, 61(5):2835–2885.
- On the role of surrogates in the efficient estimation of treatment effects with limited outcome data. arXiv preprint arXiv:2003.12408.
- Kennedy, E. H. (2019). Nonparametric causal effects based on incremental propensity score interventions. Journal of the American Statistical Association, 114(526):645–656.
- Kennedy, E. H. (2020). Towards optimal doubly robust estimation of heterogeneous causal effects. arXiv preprint arXiv:2004.14497.
- Kennedy, E. H. (2022). Semiparametric doubly robust targeted double machine learning: a review. arXiv preprint arXiv:2203.06469.
- Kleinberg, R. (2004). Nearly tight bounds for the continuum-armed bandit problem. Advances in Neural Information Processing Systems, 17.
- Le Cam, L. (2012). Asymptotic methods in statistical decision theory. Springer Science & Business Media.
- Regression adjustment in completely randomized experiments with a diverging number of covariates. Biometrika, 108(4):815–828.
- Balancing covariates via propensity score weighting. Journal of the American Statistical Association, 113(521):390–400.
- Lin, W. (2013). Agnostic notes on regression adjustments to experimental data: Reexamining Freedman’s critique. The Annals of Applied Statistics, 7(1):295 – 318.
- Smooth bandit optimization: generalization to holder space. In International Conference on Artificial Intelligence and Statistics, pages 2206–2214. PMLR.
- Luenberger, D. G. (1997). Optimization by vector space methods. John Wiley & Sons.
- A robust and efficient approach to causal inference based on sparse sufficient dimension reduction. Annals of statistics, 47(3):1505.
- Estimating high-dimensional intervention effects from observational data. The Annals of Statistics, 37(6A):3133–3164.
- Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis. Cambridge university press.
- Nakata, T. (2014). The number of collisions for the occupancy problem with unequal probabilities. Advances in Applied Probability, 46(1):168–185.
- Organization, W. H. (2004). International Statistical Classification of Diseases and related health problems: Alphabetical index, volume 3. World Health Organization.
- Paninski, L. (2003). Estimation of entropy and mutual information. Neural computation, 15(6):1191–1253.
- Higher order influence functions and minimax estimation of nonlinear functionals. In Probability and statistics: essays in honor of David A. Freedman, volume 2, pages 335–422. Institute of Mathematical Statistics.
- Quadratic semiparametric von mises calculus. Metrika, 69:227–247.
- Minimax estimation of a functional on a structured high-dimensional model. The Annals of Statistics, 45(5):1951 – 1987.
- Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics, pages 479–495.
- Estimation of regression coefficients when some regressors are not always observed. Journal of the American statistical Association, 89(427):846–866.
- The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41–55.
- Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66(5):688.
- Rubin, D. B. (1979). Using multivariate matched sampling and regression adjustment to control bias in observational studies. Journal of the American Statistical Association, 74(366a):318–328.
- Adjusting for nonignorable drop-out using semiparametric nonresponse models. Journal of the American Statistical Association, 94(448):1096–1120.
- Debiased machine learning of conditional average treatment effects and other causal functions. The Econometrics Journal, 24(2):264–289.
- The hardness of conditional independence testing and the generalised covariance measure. The Annals of Statistics, 48(3):1514 – 1538.
- Covariate balancing and the equivalence of weighting and doubly robust estimators of average treatment effects. arXiv preprint arXiv:2310.18563.
- On the application of probability theory to agricultural experiments. essay on principles. section 9. Statistical Science, pages 465–472.
- Tan, Z. (2020a). Model-assisted inference for treatment effects using regularized calibrated estimation with high-dimensional data. The Annals of Statistics, 48(2):811 – 837.
- Tan, Z. (2020b). Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data. Biometrika, 107(1):137–158.
- Ultra-high dimensional variable selection for doubly robust causal inference. Biometrics, 79(2):903–914.
- Timan, A. F. (2014). Theory of approximation of functions of a real variable. Elsevier.
- Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer Series in Statistics. Springer New York, New York, NY, 1st ed. 2009. edition.
- A clt and tight lower bounds for estimating entropy. In Electron. Colloquium Comput. Complex., volume 17, page 179.
- Van der Vaart, A. W. (2000). Asymptotic statistics, volume 3. Cambridge university press.
- On regression adjustment for the propensity score. Statistics in medicine, 33(23):4053–4072.
- Numerical equivalence of imputing scores and weighted estimators in regression analysis with missing covariates. Biostatistics, 8(2):468–473.
- Wendl, M. C. (2003). Collision probability between sets of random variables. Statistics & probability letters, 64(3):249–254.
- Minimax rates of entropy estimation on large alphabets via best polynomial approximation. IEEE Transactions on Information Theory, 62(6):3702–3720.
- Chebyshev polynomials, moment matching, and optimal estimation of the unseen. The Annals of Statistics, 47(2):857–883.
- Yadlowsky, S. (2022). Explaining practical differences between treatment effect estimators with high dimensional asymptotics. arXiv preprint arXiv:2203.12538.
- Continuous treatment effects with surrogate outcomes. arXiv preprint arXiv:2402.00168.
- A tensor-em method for large-scale latent class analysis with binary responses. Psychometrika, 88(2):580–612.
- Efficient generalization and transportation. arXiv preprint arXiv:2302.00092.
- Semi-supervised causal inference: Generalizable and double robust inference for average treatment effects under selection bias with decaying overlap. arXiv preprint arXiv:2305.12789.
- Zhao, Q. (2016). Topics in causal and high dimensional inference. PhD thesis, Stanford University.
- Marginal interventional effects. arXiv preprint arXiv:2206.10717.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.