Papers
Topics
Authors
Recent
Search
2000 character limit reached

Off-Policy Evaluation with Out-of-Sample Guarantees

Published 20 Jan 2023 in stat.ML and cs.LG | (2301.08649v3)

Abstract: We consider the problem of evaluating the performance of a decision policy using past observational data. The outcome of a policy is measured in terms of a loss (aka. disutility or negative reward) and the main problem is making valid inferences about its out-of-sample loss when the past data was observed under a different and possibly unknown policy. Using a sample-splitting method, we show that it is possible to draw such inferences with finite-sample coverage guarantees about the entire loss distribution, rather than just its mean. Importantly, the method takes into account model misspecifications of the past policy - including unmeasured confounding. The evaluation method can be used to certify the performance of a policy using observational data under a specified range of credible model assumptions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Doubly robust estimation in missing data and causal inference models. Biometrics, 61(4):962–973, 2005.
  2. Importance weighted active learning. In Proceedings of the 26th annual international conference on machine learning, pp.  49–56, 2009.
  3. Universal off-policy evaluation. Advances in Neural Information Processing Systems, 34:27475–27490, 2021.
  4. Doubly robust policy evaluation and learning. In Proceedings of the 28th International Conference on International Conference on Machine Learning, pp.  1097–1104, 2011.
  5. Overlap in observational studies with high-dimensional covariates. Journal of Econometrics, 221(2):644–654, 2021.
  6. Enhancing the outcomes of low-birth-weight, premature infants. a multisite, randomized trial. JAMA, 263 22:3035–42, 1990.
  7. Jennifer L Hill. Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20(1):217–240, 2011.
  8. A generalization of sampling without replacement from a finite universe. Journal of the American statistical Association, 47(260):663–685, 1952.
  9. Off-policy risk assessment in contextual bandits. Advances in Neural Information Processing Systems, 34:23714–23726, 2021.
  10. Sensitivity analysis of individual treatment effects: A robust conformal inference approach. Proceedings of the National Academy of Sciences, 120(6), 2023.
  11. A young woman concerned about mercury. CMAJ, 188(2):133–134, 2016.
  12. Nathan Kallus. Balanced policy evaluation and learning. Advances in neural information processing systems, 31, 2018.
  13. Minimax-optimal policy learning under unobserved confounding. Management Science, 67(5):2870–2890, 2021.
  14. Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statistical science, 22(4):523–539, 2007.
  15. The epoch-greedy algorithm for contextual multi-armed bandits. Advances in neural information processing systems, 20(1):96–1, 2007.
  16. Bandit algorithms. Cambridge University Press, 2020.
  17. Distribution-free prediction bands for non-parametric regression. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76(1):71–96, 2014.
  18. Distribution-free predictive inference for regression. Journal of the American Statistical Association, 113(523):1094–1111, 2018.
  19. Charles F Manski. Identification problems in the social sciences and everyday life. Southern Economic Journal, 70(1):11–21, 2003.
  20. Charles F Manski. Patient Care Under Uncertainty. Princeton University Press, 2019.
  21. Off-policy policy evaluation for sequential decisions under unobserved confounding. Advances in Neural Information Processing Systems, 33:18819–18831, 2020.
  22. Characterization of overlap in observational studies. In International Conference on Artificial Intelligence and Statistics, pp.  788–798. PMLR, 2020.
  23. Learning robust decision policies from observational data. Advances in Neural Information Processing Systems, 33:18205–18214, 2020.
  24. Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.
  25. Performance guarantees for individualized treatment rules. Annals of statistics, 39(2):1180, 2011.
  26. Conformalized quantile regression. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  27. The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41–55, 1983.
  28. Improved double-robust estimation in missing data and causal inference models. Biometrika, 99(2):439–456, 2012.
  29. Donald B Rubin. Using propensity scores to help design observational studies: application to the tobacco litigation. Health Services and Outcomes Research Methodology, 2(3):169–188, 2001.
  30. Average causal effects from nonrandomized studies: a practical guide and simulated example. Psychological methods, 13(4):279, 2008.
  31. Non-parametric estimation. i. validation of order statistics. The Annals of Mathematical Statistics, 16(2):187–192, 1945.
  32. A tutorial on conformal prediction. Journal of Machine Learning Research, 9(3), 2008.
  33. Learning from logged implicit exploration data. Advances in neural information processing systems, 23, 2010.
  34. Zhiqiang Tan. A distributional approach for causal inference using propensity scores. Journal of the American Statistical Association, 101(476):1619–1637, 2006.
  35. Conformal off-policy prediction in contextual bandits. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=IfgOWI5v2f.
  36. Conformal prediction under covariate shift. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  37. Dynamic treatment regimes: Statistical methods for precision medicine. Chapman and Hall/CRC, 2019.
  38. Vladimir Vovk. Conditional validity of inductive conformal predictors. In Asian conference on machine learning, pp.  475–490. PMLR, 2012.
  39. Algorithmic learning in a random world. Springer Science & Business Media, 2005.
  40. Abraham Wald. An extension of wilks’ method for setting tolerance limits. The Annals of Mathematical Statistics, 14(1):45–55, 1943.
  41. Quantile-optimal treatment regimes. Journal of the American Statistical Association, 113(523):1243–1254, 2018.
  42. Daniel Westreich. Epidemiology by Design: A Causal Approach to the Health Sciences. Oxford University Press, Incorporated, 2019. ISBN 9780190665760. URL https://books.google.se/books?id=5R2yDwAAQBAJ.
  43. Samuel S Wilks. Determination of sample sizes for setting tolerance limits. The Annals of Mathematical Statistics, 12(1):91–96, 1941.
  44. Estimating optimal treatment regimes from a classification perspective. Stat, 1(1):103–114, 2012.
  45. Cross-screening in observational studies that test many hypotheses. Journal of the American Statistical Association, 113(523):1070–1084, 2018.
  46. Sensitivity analysis for inverse probability weighting estimators via the percentile bootstrap. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81(4):735–761, 2019.
  47. Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107(499):1106–1118, 2012.
Citations (3)

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.