Counterfactual explainability of black-box prediction models
Abstract: It is crucial to be able to explain black-box prediction models to use them effectively and safely in practice. Most existing tools for model explanations are associational rather than causal, and we use two paradoxical examples to show that such explanations are generally inadequate. Motivated by the concept of genetic heritability in twin studies, we propose a new notion called counterfactual explainability for black-box prediction models. Counterfactual explainability has three key advantages: (1) it leverages counterfactual outcomes and extends methods for global sensitivity analysis (such as functional analysis of variance and Sobol's indices) to a causal setting; (2) it is defined not only for the totality of a set of input factors but also for their interactions (indeed, it is a probability measure on a whole ``explanation algebra''); (3) it also applies to dependent input factors whose causal relationship can be modeled by a directed acyclic graph, thus incorporating causal mechanisms into the explanation.
- Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82(4):1059–1086, 2020.
- Leo Breiman. Random forests. Machine Learning, 45:5–32, 2001.
- Fairness in machine learning: A survey. ACM Computing Surveys, 56(7):1–38, 2024.
- Generalized hoeffding-sobol decomposition for dependent variables-application to sensitivity analysis. Electronic Journal of Statistics, 6:2420–2448, 2012.
- Causal inference from 2k factorial designs by using potential outcomes. Journal of the Royal Statistical Society Series B: Statistical Methodology, 77(4):727–753, 2015.
- The probability of causation. Law, Probability and Risk, 16(4):163–179, 2017.
- Explainable ai (xai): Core ideas, techniques, and solutions. ACM Comput. Surv., 55(9), January 2023. 10.1145/3561048.
- B. Efron and C. Stein. The Jackknife Estimate of Variance. The Annals of Statistics, 9(3):586 – 596, 1981. 10.1214/aos/1176345462.
- Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.
- Bridging multiple worlds: multi-marginal optimal transport for causal partial-identification problem. arXiv preprint arXiv:2406.07868, 2024.
- An axiomatic approach to the concept of interaction among players in cooperative games. International Journal of game theory, 28:547–565, 1999.
- Principles of Population Genetics, volume 116. Sinauer associates Sunderland, MA, 1997.
- Causal Inference: What If. CRC Press, December 2023.
- Variable importance measures for heterogeneous causal effects. arXiv preprint arXiv:2204.06030, 2022.
- Wassily Hoeffding. A Class of Statistics with Asymptotically Normal Distribution. The Annals of Mathematical Statistics, 19(3):293 – 325, 1948. 10.1214/aoms/1177730196.
- Giles Hooker. Discovering additive structure in black box functions. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 575–580, 2004.
- Giles Hooker. Generalized functional ANOVA diagnostics for high-dimensional functions of dependent variables. Journal of Computational and Graphical Statistics, 16(3):709–732, 2007.
- Principal fairness for human and algorithmic decision-making. Statistical Science, 38(2):317–328, 2023.
- Albert Jacquard. Heritability: one word, three concepts. Biometrics, pages 465–477, 1983.
- Social science methods for twins data: Integrating causality, endowments, and heritability. Biodemography and social biology, 57(1):88–141, 2011.
- On decompositions of multivariate functions. Mathematics of computation, 79(270):953–966, 2010.
- Counterfactual fairness. Advances in neural information processing systems, 30, 2017.
- Estimating mean dimensionality of analysis of variance decompositions. Journal of the American Statistical Association, 101(474):712–721, 2006.
- Art B. Owen. Sobol’ indices and Shapley value. SIAM/ASA Journal on Uncertainty Quantification, 2(1):245–251, 2014. 10.1137/130936233.
- Art B. Owen. Monte Carlo: Theory, Methods and Examples. https://artowen.su.domains/mc/, 2023.
- On Shapley value for measuring importance of dependent inputs. SIAM/ASA Journal on Uncertainty Quantification, 5(1):986–1002, 2017.
- Judea Pearl. Direct and indirect effects. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, UAI’01, page 411–420, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc.
- Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, 2 edition, 2009.
- The Book of Why: The New Science of Cause and Effect. Basic Books, New York, 2018.
- A shapley–owen index for interaction quantification. SIAM/ASA Journal on Uncertainty Quantification, 7(3):1060–1075, 2019.
- Single world intervention graphs (SWIGs): A unification of the counterfactual and graphical approaches to causality. Technical Report 128, Center for the Statistics and the Social Sciences, University of Washington Series, 2013. URL https://csss.uw.edu/research/working-papers/single-world-intervention-graphs-swigs-unification-counterfactual-and.
- Nested Markov properties for acyclic directed mixed graphs. 51(1):334–361, 2023. 10.1214/22-AOS2253.
- Identifiability and Exchangeability for Direct and Indirect Effects. Epidemiology, 3(2):143, 1992.
- Lloyd S Shapley. A value for n-person games. In H. Kuhn and A. Tucker, editors, Contribution to the Theory of Games, volume 2. Princeton University Press, Princeton, 1953.
- Ilya M Sobol. Multidimensional quadrature formulas and Haar functions. Nauka, Moscow, 1969.
- Ilya M Sobol. Sensitivity estimates for nonlinear mathematical models. Mathematical Modeling and Computational Experiment, 1:407–414, 1993.
- Ilya M Sobol. Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates. Mathematics and Computers in Simulation, 55(1-3):271–280, 2001.
- The shapley taylor interaction index. In International conference on machine learning, pages 9259–9268. PMLR, 2020.
- Attributable fraction and related measures: Conceptual relations in the counterfactual framework. Journal of Causal Inference, 11(1):20210068, 2023. 10.1515/jci-2021-0068.
- Probabilities of causation: Bounds and identification. Annals of Mathematics and Artificial Intelligence, 28(1):287–313, 2000.
- Tyler J VanderWeele. Explanation in Causal Inference: Methods for Mediation and Interaction. Oxford University Press, 2015.
- Tyler J VanderWeele and Eric J Tchetgen Tchetgen. Attributing effects to interactions. Epidemiology, 25(5):711–722, 2014.
- A general framework for inference on algorithm-agnostic variable importance. Journal of the American Statistical Association, 118(543):1645–1658, 2023.
- Causal interpretations of black-box models. Journal of Business & Economic Statistics, 39(1):272–281, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.