Do Concept Bottleneck Models Respect Localities?
Abstract: Concept-based explainability methods use human-understandable intermediaries to produce explanations for machine learning models. These methods assume concept predictions can help understand a model's internal reasoning. In this work, we assess the degree to which such an assumption is true by analyzing whether concept predictors leverage "relevant" features to make predictions, a term we call locality. Concept-based models that fail to respect localities also fail to be explainable because concept predictions are based on spurious features, making the interpretation of the concept predictions vacuous. To assess whether concept-based models respect localities, we construct and use three metrics to characterize when models respect localities, complementing our analysis with theoretical results. Each of our metrics captures a different notion of perturbation and assess whether perturbing "irrelevant" features impacts the predictions made by a concept predictors. We find that many concept-based models used in practice fail to respect localities because concept predictors cannot always clearly distinguish distinct concepts. Based on these findings, we propose suggestions for alleviating this issue.
- Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018.
- Concept bottleneck models. In International Conference on Machine Learning, pages 5338–5348. PMLR, 2020.
- Concept embedding models: Beyond the accuracy-explainability trade-off. Advances in Neural Information Processing Systems, 35:21400–21413, 2022.
- Post-hoc concept bottleneck models. arXiv preprint arXiv:2205.15480, 2022.
- Interactive concept bottleneck models. arXiv preprint arXiv:2212.07430, 2022.
- A closer look at the intervention procedure of concept bottleneck models. arXiv preprint arXiv:2302.14260, 2023.
- Do concept bottleneck models learn as intended? arXiv preprint arXiv:2105.04289, 2021.
- Explaining classifiers with causal concept effect (cace). arXiv preprint arXiv:1907.07165, 2019.
- Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019.
- Now you see me (cme): concept-based model extraction. arXiv preprint arXiv:2010.13233, 2020.
- On completeness-aware concept-based explanations in deep neural networks. Advances in Neural Information Processing Systems, 33:20554–20565, 2020.
- Concept whitening for interpretable image recognition. Nature Machine Intelligence, 2(12):772–782, 2020.
- Probabilistic concept bottleneck models. arXiv preprint arXiv:2306.01574, 2023.
- Label-free concept bottleneck models. arXiv preprint arXiv:2304.06129, 2023.
- Tabcbm: Concept-based interpretable neural networks for tabular data. Transactions on Machine Learning Research, 2023.
- How to explain individual classification decisions. The Journal of Machine Learning Research, 11:1803–1831, 2010.
- Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.
- Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017.
- Promises and pitfalls of black-box concept learning models. arXiv preprint arXiv:2106.13314, 2021.
- Towards Robust Metrics for Concept Representation Evaluation. Proceedings of the AAAI Conference on Artificial Intelligence, 37(10):11791–11799, June 2023. doi: 10.1609/aaai.v37i10.26392. URL https://ojs.aaai.org/index.php/AAAI/article/view/26392.
- Understanding and enhancing robustness of concept-based models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 15127–15135, 2023.
- Glancenets: Interpretable, leak-proof concept-based models. Advances in Neural Information Processing Systems, 35:21212–21227, 2022a.
- Concept correlation and its effects on concept-based models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4780–4788, 2023.
- Addressing leakage in concept bottleneck models. Advances in Neural Information Processing Systems, 35:23386–23397, 2022.
- Glancenets: Interpretable, leak-proof concept-based models. Advances in Neural Information Processing Systems, 35:21212–21227, 2022b.
- An investigation of why overparameterization exacerbates spurious correlations. In International Conference on Machine Learning, pages 8346–8356. PMLR, 2020.
- The caltech-ucsd birds-200-2011 dataset. 2011.
- dsprites: Disentanglement testing sprites dataset. https://github.com/deepmind/dsprites-dataset/, 2017.
- Karl Pearson F.R.S. Liii. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559–572, 1901. doi: 10.1080/14786440109462720.
- The information bottleneck method, 2000.
- What is the state of neural network pruning? Proceedings of machine learning and systems, 2:129–146, 2020.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.