Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead

Published 26 Nov 2018 in stat.ML and cs.LG | (1811.10154v3)

Abstract: Black box machine learning models are currently being used for high stakes decision-making throughout society, causing problems throughout healthcare, criminal justice, and in other domains. People have hoped that creating methods for explaining these black box models will alleviate some of these problems, but trying to \textit{explain} black box models, rather than creating models that are \textit{interpretable} in the first place, is likely to perpetuate bad practices and can potentially cause catastrophic harm to society. There is a way forward -- it is to design models that are inherently interpretable. This manuscript clarifies the chasm between explaining black boxes and using inherently interpretable models, outlines several key reasons why explainable black boxes should be avoided in high-stakes decisions, identifies challenges to interpretable machine learning, and provides several example applications where interpretable models could potentially replace black box models in criminal justice, healthcare, and computer vision.

Abstract PDF Upgrade to Chat

Citations (211)

View on Semantic Scholar

Summary

The paper demonstrates that inherent interpretability in ML models can achieve accuracy comparable to black box methods in high-stakes environments.
It critiques posthoc explanations for black boxes, highlighting their limited fidelity and the risk of misleading critical decision-making.
The study advocates transitioning to interpretable models, using techniques like the CORELS algorithm to enhance transparency and ethical decision processes.

Interpretability in Machine Learning: A Critical Analysis of Black Box Models for High-Stakes Decisions

Cynthia Rudin's paper presents a compelling discourse on the dangers and inefficiencies of using black box machine learning models in high-stakes decision environments such as healthcare and criminal justice. The central thesis advocates for the transition from posthoc explanations of black box models to the development and utilization of inherently interpretable models.

The Argument Against Explaining Black Boxes

The paper outlines key issues with explainable machine learning, particularly in environments where decisions have significant social consequences. Rudin challenges the prevalent notion of a necessary trade-off between model accuracy and interpretability, providing evidence that interpretable models can achieve similar levels of accuracy to complex black box models in many cases. Furthermore, the paper criticizes the reliability of explanations derived from black boxes, arguing that explanations must inherently lack perfect fidelity with the original model. This critical perspective raises concerns about trust in machine learning systems, as explanations for high-stakes applications are often required to be both understandable and accurate.

Pragmatic and Ethical Considerations

Pragmatically, black boxes with explanations can create complex decision pathways prone to human error, especially when integrating external information into risk assessments. Ethically, Rudin addresses the potential harm caused by opaque decision-making processes, emphasizing the societal impact of models used for parole or bail decisions and in healthcare settings. Such settings demand transparency and accountability, which black box models inherently lack.

The Case for Interpretable Models

Rudin argues that interpretable models should be prioritized over explainable black boxes. Interpretable models not only offer transparency but also facilitate easier incorporation of domain-specific knowledge and constraints. These models can be especially beneficial in terms of safety and trust, mitigating risks associated with the use of complex black box models and their often unreliable explanations.

Challenges in Developing Interpretable Models

Despite the advantages, developing interpretable models is not without challenges. The paper acknowledges the computational complexity often involved in creating such models. Constructing interpretable models requires significant effort in terms of both computational resources and domain expertise. However, advancements in optimization techniques and algorithm design provide a path forward. Rudin's work with the CORELS algorithm, which optimally solves rule lists for categorical data, exemplifies successful approaches to overcoming these challenges.

Implications and Future Directions

The paper implies that a shift towards inherently interpretable machine learning models could significantly improve decision-making in high-stakes fields. In terms of policy, Rudin suggests mandating institutions to explore interpretable models and report their accuracies compared to black boxes. This could encourage responsible ML governance and push for transparency and fairness in decision models. Theoretically, this paper invites further exploration into the Rashomon set and its implications for finding interpretably accurate models across diverse domains.

Conclusion

Rudin's paper is a critical examination of the status quo in machine learning for decision-making processes in high-stakes applications. It effectively argues for a paradigm shift towards interpretable models, providing both philosophical reasoning and empirical evidence. This paper is a call to the research community to prioritize transparency over complexity, urging advancements that could potentially have substantial implications on societal trust and decision-making frameworks in critical fields.

Markdown Report Issue