The Disagreement Problem in Faithfulness Metrics

Published 13 Nov 2023 in cs.LG and cs.AI | (2311.07763v1)

Abstract: The field of explainable artificial intelligence (XAI) aims to explain how black-box machine learning models work. Much of the work centers around the holy grail of providing post-hoc feature attributions to any model architecture. While the pace of innovation around novel methods has slowed down, the question remains of how to choose a method, and how to make it fit for purpose. Recently, efforts around benchmarking XAI methods have suggested metrics for that purpose -- but there are many choices. That bounty of choice still leaves an end user unclear on how to proceed. This paper focuses on comparing metrics with the aim of measuring faithfulness of local explanations on tabular classification problems -- and shows that the current metrics don't agree; leaving users unsure how to choose the most faithful explanations.

Abstract PDF Upgrade to Chat

Citations (3)

View on Semantic Scholar

Summary

The paper reveals that common faithfulness metrics for AI explanations often conflict, complicating the validation of model interpretations.
It employs local explanation methods like Deep SHAP, KernelSHAP, and Integrated Gradients across diverse models to benchmark metric reliability.
Findings indicate that perturbation-based metrics are highly sensitive to parameter choices, highlighting the need for context-specific calibration.

In the field of explainable artificial intelligence (XAI), a significant challenge is determining how well various methods and metrics measure the "faithfulness" of post-hoc explanations provided for AI model predictions. These explanations assign importance to input features to clarify how an AI model arrives at its conclusions. However, despite an array of available metrics to assess faithfulness, these metrics often conflict, causing confusion among practitioners as to which should be employed to confirm that explanations accurately represent a model’s reasoning.

To address this issue, a study was launched focusing on the discrepancies among current faithfulness metrics. The researchers adopted various local explanation methods for linear and non-linear models across multiple datasets. Local explanation methods generate attributions for individual predictions, allowing users to understand the rationale behind specific decisions made by the model. Included in the study were popular attribution techniques such as Deep SHAP, KernelSHAP, and Integrated Gradients, each accompanied by different baseline comparisons to account for variability. The researchers also evaluated the novel use of ablation and topological data analysis (TDA) among the metrics assessed.

The study provided an extensive comparison of faithfulness metrics by implementing a ranking system based on their assessments. Ideally, if all metrics were aligned, the rankings would converge, but the study's findings showed otherwise: there was little consensus on which set of explanations could be considered the most faithful. Perturbation-based metrics, in particular, demonstrated variability due to their sensitivity to parameter selection, such as the perturbation method and choice of features to perturb. These findings suggest that the selection of faithfulness metrics may be highly contextual and that more research is required to determine the most appropriate metrics for specific use cases.

In conclusion, the study highlighted a gap between theoretical XAI metrics and their practical utility. The absence of agreement among measures of faithfulness can leave users without clear guidance on how to choose the most appropriate explanations for their AI models. This underscores the need for a more refined understanding of these metrics and their implications in practice. The paper urges the XAI community to take note of these divergences and to further investigate the development of more harmonized benchmarks that can provide consistent and reliable guidance for evaluating the faithfulness of AI explanations.