A Unified Approach to Interpreting Model Predictions

Published 22 May 2017 in cs.AI, cs.LG, and stat.ML | (1705.07874v2)

Abstract: Understanding why a model makes a certain prediction can be as crucial as the prediction's accuracy in many applications. However, the highest accuracy for large modern datasets is often achieved by complex models that even experts struggle to interpret, such as ensemble or deep learning models, creating a tension between accuracy and interpretability. In response, various methods have recently been proposed to help users interpret the predictions of complex models, but it is often unclear how these methods are related and when one method is preferable over another. To address this problem, we present a unified framework for interpreting predictions, SHAP (SHapley Additive exPlanations). SHAP assigns each feature an importance value for a particular prediction. Its novel components include: (1) the identification of a new class of additive feature importance measures, and (2) theoretical results showing there is a unique solution in this class with a set of desirable properties. The new class unifies six existing methods, notable because several recent methods in the class lack the proposed desirable properties. Based on insights from this unification, we present new methods that show improved computational performance and/or better consistency with human intuition than previous approaches.

Abstract PDF Upgrade to Chat

Citations (18,466)

View on Semantic Scholar

Summary

The paper presents SHAP, a novel framework that unifies multiple additive feature attribution methods for model interpretation.
The methodology guarantees local accuracy, missingness, and consistency, establishing SHAP values as the unique solution for feature attribution.
Empirical evaluations demonstrate SHAP’s computational efficiency and its alignment with human judgment in assessing feature importance.

A Unified Approach to Interpreting Model Predictions

Introduction

The challenge of interpreting predictions from complex models is crucial in numerous applications. While simple models such as linear regressions offer ease of interpretability, they often fall short in accuracy compared to more complex models like ensemble methods or deep neural networks. This paper introduces a unified framework, SHAP (SHapley Additive exPlanations), aiming to bridge this gap by providing a consistent approach to model interpretation. SHAP builds upon cooperative game theory to offer a singular solution for feature importance that adheres to key properties required for effective interpretation.

Additive Feature Attribution Methods

The core of SHAP lies in the identification of additive feature attribution methods, a class of explanation models that are linear functions of binary variables. Each feature is attributed an effect, and the sum approximates the prediction of the original model. The paper demonstrates that six established methods, including LIME and DeepLIFT, fall within this class, emphasizing the universality of the proposed framework. SHAP introduces three properties—local accuracy, missingness, and consistency—that uniquely determine SHAP values as the ideal solution within this class.

Uniqueness and Properties of SHAP

A significant advancement presented is the proof that SHAP values are the only solution that simultaneously satisfies local accuracy, missingness, and consistency properties across all explanation models derived from the aforementioned methods. Local accuracy ensures the explanation model matches the prediction for a specific input, missingness imposes no impact on unseen features, and consistency ensures attribution does not decrease when a feature's contribution increases.

SHAP Value Estimation Methods

The paper delineates several methods for estimating SHAP values, including both model-agnostic and model-specific approaches. Kernel SHAP and Shapley sampling values are methods that apply sampling approximations to estimate SHAP values, the former benefiting from insightful connections between Shapley values and weighted regression. Model-specific methods like Linear SHAP leverage linear model properties to directly compute approximations, while Deep SHAP extends DeepLIFT's recursive approach, achieving computational efficiency in explaining deep networks.

Computational and User Study Experiments

Empirical evaluations underline SHAP's computational efficiency and alignment with human intuition over alternative methods. Kernel SHAP exhibits superior sample efficiency, providing accurate feature importance estimates with fewer model evaluations compared to existing sampling-based approaches. User studies underscore SHAP values' consistency with human judgments, offering more intuitive interpretations.

Practical Implications and Future Work

SHAP's unified approach holds considerable promise for enhancing model interpretability across diverse domains. By justifying the choice of SHAP values through rigorous theoretical and empirical evaluations, the paper sets a foundation for future exploration into faster computation methods, integration of interaction effects, and expansion into new explanation model classes.

Conclusion

The SHAP framework systematically unifies existing feature attribution methods under a single mathematical roof, addressing the long-standing trade-off between model complexity and interpretability. The paper's insights about SHAP values pave the way for advancing interpretable machine learning, providing a robust method for interpreting predictions from complex models. The promise of SHAP lies in its adherence to crucial interpretability properties and an extensive foundation in game theory, offering a practical tool for researchers and practitioners in understanding model predictions.