Off-Policy Evaluation with Out-of-Sample Guarantees
Abstract: We consider the problem of evaluating the performance of a decision policy using past observational data. The outcome of a policy is measured in terms of a loss (aka. disutility or negative reward) and the main problem is making valid inferences about its out-of-sample loss when the past data was observed under a different and possibly unknown policy. Using a sample-splitting method, we show that it is possible to draw such inferences with finite-sample coverage guarantees about the entire loss distribution, rather than just its mean. Importantly, the method takes into account model misspecifications of the past policy - including unmeasured confounding. The evaluation method can be used to certify the performance of a policy using observational data under a specified range of credible model assumptions.
- Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4):962–973, 2005.
- Importance weighted active learning. In Proceedings of the 26th annual international conference on machine learning, pp. 49–56, 2009.
- Universal off-policy evaluation. Advances in Neural Information Processing Systems, 34:27475–27490, 2021.
- Doubly robust policy evaluation and learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning, pp. 1097–1104, 2011.
- Overlap in observational studies with high-dimensional covariates. Journal of Econometrics, 221(2):644–654, 2021.
- Enhancing the outcomes of low-birth-weight, premature infants. a multisite, randomized trial. JAMA, 263 22:3035–42, 1990.
- Jennifer L Hill. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20(1):217–240, 2011.
- A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association, 47(260):663–685, 1952.
- Off-policy risk assessment in contextual bandits. Advances in Neural Information Processing Systems, 34:23714–23726, 2021.
- Sensitivity analysis of individual treatment effects: A robust conformal inference approach. Proceedings of the National Academy of Sciences, 120(6), 2023.
- A young woman concerned about mercury. CMAJ, 188(2):133–134, 2016.
- Nathan Kallus. Balanced policy evaluation and learning. Advances in neural information processing systems, 31, 2018.
- Minimax-optimal policy learning under unobserved confounding. Management Science, 67(5):2870–2890, 2021.
- Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical science, 22(4):523–539, 2007.
- The epoch-greedy algorithm for contextual multi-armed bandits. Advances in neural information processing systems, 20(1):96–1, 2007.
- Bandit algorithms. Cambridge University Press, 2020.
- Distribution-free prediction bands for non-parametric regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1):71–96, 2014.
- Distribution-free predictive inference for regression. Journal of the American Statistical Association, 113(523):1094–1111, 2018.
- Charles F Manski. Identification problems in the social sciences and everyday life. Southern Economic Journal, 70(1):11–21, 2003.
- Charles F Manski. Patient Care Under Uncertainty. Princeton University Press, 2019.
- Off-policy policy evaluation for sequential decisions under unobserved confounding. Advances in Neural Information Processing Systems, 33:18819–18831, 2020.
- Characterization of overlap in observational studies. In International Conference on Artificial Intelligence and Statistics, pp. 788–798. PMLR, 2020.
- Learning robust decision policies from observational data. Advances in Neural Information Processing Systems, 33:18205–18214, 2020.
- Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.
- Performance guarantees for individualized treatment rules. Annals of statistics, 39(2):1180, 2011.
- Conformalized quantile regression. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41–55, 1983.
- Improved double-robust estimation in missing data and causal inference models. Biometrika, 99(2):439–456, 2012.
- Donald B Rubin. Using propensity scores to help design observational studies: application to the tobacco litigation. Health Services and Outcomes Research Methodology, 2(3):169–188, 2001.
- Average causal effects from nonrandomized studies: a practical guide and simulated example. Psychological methods, 13(4):279, 2008.
- Non-parametric estimation. i. validation of order statistics. The Annals of Mathematical Statistics, 16(2):187–192, 1945.
- A tutorial on conformal prediction. Journal of Machine Learning Research, 9(3), 2008.
- Learning from logged implicit exploration data. Advances in neural information processing systems, 23, 2010.
- Zhiqiang Tan. A distributional approach for causal inference using propensity scores. Journal of the American Statistical Association, 101(476):1619–1637, 2006.
- Conformal off-policy prediction in contextual bandits. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=IfgOWI5v2f.
- Conformal prediction under covariate shift. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- Dynamic treatment regimes: Statistical methods for precision medicine. Chapman and Hall/CRC, 2019.
- Vladimir Vovk. Conditional validity of inductive conformal predictors. In Asian conference on machine learning, pp. 475–490. PMLR, 2012.
- Algorithmic learning in a random world. Springer Science & Business Media, 2005.
- Abraham Wald. An extension of wilks’ method for setting tolerance limits. The Annals of Mathematical Statistics, 14(1):45–55, 1943.
- Quantile-optimal treatment regimes. Journal of the American Statistical Association, 113(523):1243–1254, 2018.
- Daniel Westreich. Epidemiology by Design: A Causal Approach to the Health Sciences. Oxford University Press, Incorporated, 2019. ISBN 9780190665760. URL https://books.google.se/books?id=5R2yDwAAQBAJ.
- Samuel S Wilks. Determination of sample sizes for setting tolerance limits. The Annals of Mathematical Statistics, 12(1):91–96, 1941.
- Estimating optimal treatment regimes from a classification perspective. Stat, 1(1):103–114, 2012.
- Cross-screening in observational studies that test many hypotheses. Journal of the American Statistical Association, 113(523):1070–1084, 2018.
- Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81(4):735–761, 2019.
- Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107(499):1106–1118, 2012.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.